[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80604/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #80603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80603/testReport)** for PR 17980 at commit [`5db6acf`](https://github.com/apache/spark/commit/5db6acf2cde02217c283103ebcbfd2630338852c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18933 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80608/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18933 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #80604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80604/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18933 **[Test build #80608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80608/testReport)** for PR 18933 at commit [`0f182d0`](https://github.com/apache/spark/commit/0f182d0f6d8e1eb3b92e7e0eb39f2616235c1368). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132877186 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -32,6 +32,12 @@ class JoinSuite extends QueryTest with SharedSQLContext { setupTestData() + override def afterEach(): Unit = { --- End diff -- yes, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 @yanboliang Hi, yangbo. Could you help review the PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18934 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18934 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18934 **[Test build #80609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80609/testReport)** for PR 18934 at commit [`1f8444d`](https://github.com/apache/spark/commit/1f8444d8e8135a48fc1aa4f2a88cab9a26c7d32b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18875 @viirya and @goldmedal, I am sorry. I misread https://github.com/apache/spark/pull/18875#discussion_r132613913. I was thinking arbitrary map support here too because I thought arbitrary map support should be much more common in practice. Could we do this arbitrary map support too if it won't be too difficult? I was thinking array / renaming could be done in a followup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r132875524 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -640,6 +644,7 @@ case class StructsToJson( lazy val rowSchema = child.dataType match { case st: StructType => st case ArrayType(st: StructType, _) => st +case MapType(kt: DataType, st: StructType, _) => st --- End diff -- little nit: `kt` -> `_`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r132875639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -661,13 +666,19 @@ case class StructsToJson( (arr: Any) => gen.write(arr.asInstanceOf[ArrayData]) getAndReset() + case MapType(_: DataType, _: StructType, _: Boolean) => --- End diff -- Looks we can use this `MapType` directly for `mapType` in `gen.write(map.asInstanceOf[MapData], mapType)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r132875370 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -202,5 +203,9 @@ private[sql] class JacksonGenerator( */ def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(map: MapData, mapType: MapType): Unit = { --- End diff -- I am less sure if this `write` should take `mapType`. Looks equivalent `write(row: InternalRow)` does not take the struct type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r132875025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -676,7 +687,8 @@ case class StructsToJson( TypeCheckResult.TypeCheckFailure(e.getMessage) } case _ => TypeCheckResult.TypeCheckFailure( - s"Input type ${child.dataType.simpleString} must be a struct or array of structs.") + s"Input type ${child.dataType.simpleString} must be a struct, array of structs or " + + s"map with a structs value.") --- End diff -- little nit: `s` looks not required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132874899 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -32,6 +32,12 @@ class JoinSuite extends QueryTest with SharedSQLContext { setupTestData() + override def afterEach(): Unit = { --- End diff -- Can we explain this in the comment without executable code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18756 **[Test build #80610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80610/testReport)** for PR 18756 at commit [`b8473cc`](https://github.com/apache/spark/commit/b8473cc764ad18cf675c147f0ad0e291c3071a77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18895#discussion_r132874492 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1837,8 +1847,8 @@ def fill(self, value, subset=None): fill.__doc__ = DataFrame.fillna.__doc__ -def replace(self, to_replace, value, subset=None): -return self.df.replace(to_replace, value, subset) +def replace(self, to_replace, value=None, subset=None): +return self.df.replace(to_replace=to_replace, value=value, subset=subset) --- End diff -- I think it is okay to leave this line as was. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18895#discussion_r132874471 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1403,6 +1403,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null|null| ++--++ +>>> df4.na.replace('Alice').show() +++--++ +| age|height|name| +++--++ +| 10|80|null| +| 5| null| Bob| +|null| null| Tom| +|null| null|null| +++--++ --- End diff -- looks trailing white spaces should be removed. Could we remove these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18734 cc @ushine who I believe is also appropriate to review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/18805 https://github.com/facebook/zstd/issues/775 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80607/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80607/testReport)** for PR 18931 at commit [`4bef567`](https://github.com/apache/spark/commit/4bef5677b7338818bd9c44389fd183a8bd775610). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18909: [MINOR][SQL] Additional test case for CheckCartes...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18909 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18909 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18909 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132871517 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- cc @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18934 **[Test build #80609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80609/testReport)** for PR 18934 at commit [`1f8444d`](https://github.com/apache/spark/commit/1f8444d8e8135a48fc1aa4f2a88cab9a26c7d32b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18934 [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed ## What changes were proposed in this pull request? We put staging path to delete into the deleteOnExit cache of `FileSystem` in case of the path can't be successfully removed. But when we successfully remove the path, we don't remove it from the cache. We should do it to avoid continuing grow the cache size. ## How was this patch tested? Added a test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21721 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18934.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18934 commit 1f8444d8e8135a48fc1aa4f2a88cab9a26c7d32b Author: Liang-Chi HsiehDate: 2017-08-14T04:11:03Z Clear FileSystem deleteOnExit cache when paths are successfully removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80605/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r132870799 --- Diff: python/pyspark/sql/tests.py --- @@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self): pdf = df.toPandas() self.spark.conf.set("spark.sql.execution.arrow.enable", "true") pdf_arrow = df.toPandas() +# need to remove timezone for comparison +pdf_arrow["7_timestamp_t"] = \ +pdf_arrow["7_timestamp_t"].apply(lambda ts: ts.tz_localize(None)) --- End diff -- I sent a pr #18933 as a WIP for "without-Arrow" version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18664 @BryanCutler I'm sorry for the delay. I think it's too strict as an API to use `SparkSession` to apply timezone. How about throwing an exception instead of using `DateTimeUtils.defaultTimeZone()` when `timeZoneId` is `None`? In that case, we should use `String` without default parameter instead of `Option[String] = None` for `timeZoneId`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18933: [SPARK-21722][SQL][PYTHON] Enable timezone-aware timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18933 **[Test build #80608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80608/testReport)** for PR 18933 at commit [`0f182d0`](https://github.com/apache/spark/commit/0f182d0f6d8e1eb3b92e7e0eb39f2616235c1368). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18933: [SPARK-21722][SQL][PYTHON] Enable timezone-aware ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/18933 [SPARK-21722][SQL][PYTHON] Enable timezone-aware timestamp type when creating Pandas DataFrame. ## What changes were proposed in this pull request? Make Pandas DataFrame with timezone-aware timestamp type when converting `DataFrame` to Pandas DataFrame by `pyspark.sql.DataFrame.toPandas`. The session local timezone is used for the timezone. ## How was this patch tested? Added a test and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-21722 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18933 commit 0f182d0f6d8e1eb3b92e7e0eb39f2616235c1368 Author: Takuya UESHINDate: 2017-08-14T02:08:03Z Enable timezone-aware timestamp type when creating Pandas DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18904 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80601/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18904 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18904 **[Test build #80601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80601/testReport)** for PR 18904 at commit [`b349668`](https://github.com/apache/spark/commit/b34966871dbc5d13c697965e227b6136faed4c9a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80599/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80599/testReport)** for PR 18920 at commit [`5239ebb`](https://github.com/apache/spark/commit/5239ebb5843315430d5c942dc53e09fb09d6c1c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80607/testReport)** for PR 18931 at commit [`4bef567`](https://github.com/apache/spark/commit/4bef5677b7338818bd9c44389fd183a8bd775610). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80606/testReport)** for PR 18899 at commit [`7ab264d`](https://github.com/apache/spark/commit/7ab264d848f1690c6af316cb9940687d81db360b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18875 @HyukjinKwon any more comments on this change? We can support the arbitrary Map type and rename this expression in following PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80605/testReport)** for PR 18931 at commit [`5fe3762`](https://github.com/apache/spark/commit/5fe3762a3dcf1893b3bfffb17832fdd5c3d1e364). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80602/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80602/testReport)** for PR 18899 at commit [`83ac893`](https://github.com/apache/spark/commit/83ac8933a14a01919f27797ad9b13376977f5a98). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18648 ping @cloud-fan would you take another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #80604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80604/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #80603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80603/testReport)** for PR 17980 at commit [`5db6acf`](https://github.com/apache/spark/commit/5db6acf2cde02217c283103ebcbfd2630338852c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80602/testReport)** for PR 18899 at commit [`83ac893`](https://github.com/apache/spark/commit/83ac8933a14a01919f27797ad9b13376977f5a98). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17980 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18904 **[Test build #80601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80601/testReport)** for PR 18904 at commit [`b349668`](https://github.com/apache/spark/commit/b34966871dbc5d13c697965e227b6136faed4c9a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80600/testReport)** for PR 18899 at commit [`d50de99`](https://github.com/apache/spark/commit/d50de9961f78c8d259b9167081c2d9529ce91a63). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80600/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18904 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80600/testReport)** for PR 18899 at commit [`d50de99`](https://github.com/apache/spark/commit/d50de9961f78c8d259b9167081c2d9529ce91a63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Thanks @sethah @srowen . The comment is added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/9518 Sorry I don't have the permission to merge this. Ping @cloud-fan @JoshRosen to review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r132860287 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression]) // the fields to query are the remaining children @transient private lazy val fieldExpressions: Seq[Expression] = children.tail + // a field name given with constant null will be replaced with this pseudo field name + private val nullFieldName = "__NullFieldName" --- End diff -- Yeah, I've also considered using Option here. But don't want to come out Option version first, so we can experience review process. It looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132860195 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -32,6 +32,12 @@ class JoinSuite extends QueryTest with SharedSQLContext { setupTestData() + override def afterEach(): Unit = { --- End diff -- yes, this is a default behavior. But it can explain the work to clear the cache. ``` // Clear the cache table for test cases // by super.afterEach() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 ok, I will solve the problems left first, and hold this PR @gatorsmile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80599/testReport)** for PR 18920 at commit [`5239ebb`](https://github.com/apache/spark/commit/5239ebb5843315430d5c942dc53e09fb09d6c1c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18920 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18920 updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80598/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18909 **[Test build #80598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80598/testReport)** for PR 18909 at commit [`4d04a41`](https://github.com/apache/spark/commit/4d04a4120ffa23cd9424dc4aa3301314b51a5d3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132856266 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- For the latter, It looks we should call either `substr` with column,column or with int,int. I would like to avoid changing these If either way does not reduce the code diff and is virtually same, if I understood correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 When we join two tables, given there are equi-join keys, and they are non-deterministic, for example `t1.a = rand(t2.b)` and `t1.c = rand(t2.d)`. We pull out them to downstream project: Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Project [t1.a, t1.c] TableScan t1 Project [rand(t2.b) as rand(t2.b), rand(t2.d) as rand(t2.d)] TableScan t2 `rand(t2.b)` and `rand(t2.d)` are evaluated in projection. Why Join will change its order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132856119 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- It needs to check the types in general and we need to hide the error message related with Java types. It is also true that we also need to make such logics in to Scala one to deduplicate this logic if they are duplicated. R has also a similar problem in some places. I don't think we should change this case anyway. It looks we should ... ``` py4j.Py4JException: Method substr([class java.lang.Long ... ``` or we should introduce bridge methods in Scala side and implement this checking logic IIRC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi Thanks for finding the non-convergent case! Let me see how to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18877 If there's no more feedback, I plan to push this soon to unblock other work on this module. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18909 @gatorsmile sure, this PR is only about tests, I was just wondering what is planned regarding cross joins with inequality conditions. I borrowed several tests from PR #16762 and added additional ones. As I mentioned, there is a small overlap between the existing tests and proposed ones but they are defined at different levels. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18909 **[Test build #80598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80598/testReport)** for PR 18909 at commit [`4d04a41`](https://github.com/apache/spark/commit/4d04a4120ffa23cd9424dc4aa3301314b51a5d3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/12646 retest please. I run successfully at local. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 Did not get your point. Could you just give an example why the non-deterministic expressions are always evaluated in the same order no matter which join types are chosen during the physical planning? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132847782 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- If PySpark always needs to check the types, are we doing the same things in all the other function calls? In addition, why not directly checking ```Python if isinstance(length, (int, long)): ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80597/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18866 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18866 **[Test build #80597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80597/testReport)** for PR 18866 at commit [`19f880b`](https://github.com/apache/spark/commit/19f880bcd1e519ac28e23df4a0bb6c796348ae30). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ClusteredDistribution(clustering: Seq[Expression], clustersOpt: Option[Int] = None,` * `case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18920#discussion_r132847406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -449,6 +449,28 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + private def assertNoExceptions(c: Column): Unit = { +for ((wholeStage, useObjectHashAgg) <- Seq((true, false), (false, false), (false, true))) { + withSQLConf( +(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString), +(SQLConf.USE_OBJECT_HASH_AGG.key, useObjectHashAgg.toString)) { +val df = Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4)).toDF("x", "y") +// HashAggregate --- End diff -- We need to check/compare the plans to ensure they are HashAggregate, ObjectHashAggregate and SortAggregate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80596/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #80596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80596/testReport)** for PR 18875 at commit [`c64d9c4`](https://github.com/apache/spark/commit/c64d9c49c5f42b7a72b1570f00daa34a2843be1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18317#discussion_r132846768 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -0,0 +1,288 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import com.google.common.base.Preconditions; +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.GuardedBy; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; + +/** + * {@link InputStream} implementation which asynchronously reads ahead from the underlying input + * stream when specified amount of data has been read from the current buffer. It does it by maintaining + * two buffer - active buffer and read ahead buffer. Active buffer contains data which should be returned + * when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying + * input stream and once the current active buffer is exhausted, we flip the two buffers so that we can + * start reading from the read ahead buffer without being blocked in disk I/O. + */ +public class ReadAheadInputStream extends InputStream { + + private ReentrantLock stateChangeLock = new ReentrantLock(); + + @GuardedBy("stateChangeLock") + private ByteBuffer activeBuffer; + + @GuardedBy("stateChangeLock") + private ByteBuffer readAheadBuffer; + + @GuardedBy("stateChangeLock") + private boolean endOfStream; + + @GuardedBy("stateChangeLock") + // true if async read is in progress + private boolean isReadInProgress; + + @GuardedBy("stateChangeLock") + // true if read is aborted due to an exception in reading from underlying input stream. + private boolean isReadAborted; + + @GuardedBy("stateChangeLock") + private Exception readException; + + // If the remaining data size in the current buffer is below this threshold, + // we issue an async read from the underlying input stream. + private final int readAheadThresholdInBytes; + + private final InputStream underlyingInputStream; + + private final ExecutorService executorService = Executors.newSingleThreadExecutor(); + + private final Condition asyncReadComplete = stateChangeLock.newCondition(); + + private final byte[] oneByte = new byte[1]; + + /** + * Creates a ReadAheadInputStream with the specified buffer size and read-ahead + * threshold + * + * @param inputStream The underlying input stream. + * @param bufferSizeInBytes The buffer size. + * @param readAheadThresholdInBytes If the active buffer has less data than the read-ahead + * threshold, an async read is triggered. + */ + public ReadAheadInputStream(InputStream inputStream, int bufferSizeInBytes, int readAheadThresholdInBytes) { +Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes should be greater than 0"); +Preconditions.checkArgument(readAheadThresholdInBytes > 0 && readAheadThresholdInBytes < bufferSizeInBytes, +"readAheadThresholdInBytes should be greater than 0 and less than bufferSizeInBytes" ); +activeBuffer = ByteBuffer.allocate(bufferSizeInBytes); +readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes); +this.readAheadThresholdInBytes = readAheadThresholdInBytes; +this.underlyingInputStream = inputStream; +activeBuffer.flip(); +readAheadBuffer.flip(); + } + + private boolean isEndOfStream() { +if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 && endOfStream) { + return true; +} +return false; + } + + + private void readAsync(final ByteBuffer byteBuffer) throws IOException { +stateChangeLock.lock(); +if (endOfStream || isReadInProgress) { + stateChangeLock.unlock();
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132846725 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -32,6 +32,12 @@ class JoinSuite extends QueryTest with SharedSQLContext { setupTestData() + override def afterEach(): Unit = { --- End diff -- Do we need this overriding? I think this is a default behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80593/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80593/testReport)** for PR 18914 at commit [`7dd8cdb`](https://github.com/apache/spark/commit/7dd8cdbf7caac507cbd703193350503309b5f159). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r132846456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression]) // the fields to query are the remaining children @transient private lazy val fieldExpressions: Seq[Expression] = children.tail + // a field name given with constant null will be replaced with this pseudo field name + private val nullFieldName = "__NullFieldName" --- End diff -- @jmchung, could we maybe compute this foldable related optimization ahead - https://github.com/jmchung/spark/blob/ffa575a6731fef3e0731b73e0f7311cb024e831b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L425-L439 and remove this fake field name? I think we can make a function for the above codes first and then use it for computation for each row. Did I understand correctly? I tried a rough version I thought - https://github.com/jmchung/spark/compare/SPARK-21677...HyukjinKwon:tmp-18930?expand=1, @viirya what do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80592/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org