[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 Thx for taking a deeper look @HyukjinKwon, much appreciated ! I will wait for @jiangxb1987 to also opine before committing - I want to make sure we are not adding incorrect behavior; given that this is a followup to an earlier PR (some excellent work by @szhem btw) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19501: [SPARK-22223][SQL] ObjectHashAggregate should not introd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19501 **[Test build #82768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82768/testReport)** for PR 19501 at commit [`c845627`](https://github.com/apache/spark/commit/c84562763034e3fc6a7ddba785131cb4a1c36eb4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19501: [SPARK-22223][SQL] ObjectHashAggregate should not...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19501 [SPARK-3][SQL] ObjectHashAggregate should not introduce unnecessary shuffle ## What changes were proposed in this pull request? `ObjectHashAggregateExec` should override `outputPartitioning` in order to avoid unnecessary shuffle. ## How was this patch tested? Added Jenkins test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19501 commit c84562763034e3fc6a7ddba785131cb4a1c36eb4 Author: Liang-Chi Hsieh Date: 2017-10-15T06:02:59Z ObjectHashAggregate should not introduce unnecessary shuffle. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 @mridulm, I just checked thought the related changes and checked the tests pass on branch-2.1. Seems this PR will actually also allow the cases below: ```scala .saveAsNewAPIHadoopFile[...]("") .saveAsNewAPIHadoopFile[...]("::invalid:::") ``` Currently both are failed but seems this PR allows those cases: ``` Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.(Path.java:135) at org.apache.hadoop.fs.Path.(Path.java:89) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:61) ... ``` ``` java.net.URISyntaxException: Relative path in absolute URI: ::invalid::: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ::invalid::: at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.(Path.java:172) at org.apache.hadoop.fs.Path.(Path.java:89) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:61) ... ``` I think we should protect these cases as this. For the cases for old one: ```scala .saveAsHadoopFile[...]("") .saveAsHadoopFile[...]("::invalid:::") ``` these looks failed fast (whether it was initially intended or not) and I guess this PR does not affect these: ``` Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.(Path.java:135) at org.apache.spark.internal.io.SparkHadoopWriterUtils$.createPathFromString(SparkHadoopWriterUtils.scala:54) ``` ``` java.net.URISyntaxException: Relative path in absolute URI: ::invalid::: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ::invalid::: at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.(Path.java:172) at org.apache.spark.internal.io.SparkHadoopWriterUtils$.createPathFromString(SparkHadoopWriterUtils.scala:54) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19500 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82767/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19500 **[Test build #82767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82766/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit [`0bb5afe`](https://github.com/apache/spark/commit/0bb5afe54a9a53054d2076ac28b09234a7380bbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19471 We may need to document this change in `Migration Guide` in SQL programming guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19500 **[Test build #82767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.co...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19499#discussion_r144708907 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto } test("test statistics of LogicalRelation converted from Hive serde tables") { --- End diff -- This should be handled in a separate PR, #19500 . After #19500, I will remove this change on test code from this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19500 [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly ## What changes were proposed in this pull request? This PR aims to improve **StatisticsSuite** to test `convertMetastore` configuration properly. Currently, some test logic in `test statistics of LogicalRelation converted from Hive serde tables` depends on the default configuration. New test case is shorter and covers both(true/false) cases explicitly. ## How was this patch tested? Pass the Jenkins with the improved test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22280 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19500.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19500 commit 2a0a3f1b3f029c2454a471b33fed7766694fa518 Author: Dongjoon Hyun Date: 2017-10-15T03:38:22Z [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82765/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19499 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19499 **[Test build #82765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82765/testReport)** for PR 19499 at commit [`83cde8b`](https://github.com/apache/spark/commit/83cde8b2fcf1fb12567cd0bf7eef702186234a23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit [`0bb5afe`](https://github.com/apache/spark/commit/0bb5afe54a9a53054d2076ac28b09234a7380bbf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19452#discussion_r144708446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -349,14 +350,35 @@ case class StreamingSymmetricHashJoinExec( /** * Internal helper class to consume input rows, generate join output rows using other sides * buffered state rows, and finally clean up this sides buffered state rows + * + * @param joinSide The JoinSide - either left or right. + * @param inputAttributes The input attributes for this side of the join. + * @param joinKeys The join keys. + * @param inputIter The iterator of input rows on this side to be joined. + * @param preJoinFilterExpr A filter over rows on this side. This filter rejects rows that could + * never pass the overall join condition no matter what other side row + * they're joined with. + * @param postJoinFilterExpr A filter over joined rows. This filter completes the application of + * the overall join condition, assuming that preJoinFilter on both sides + * of the join has already been passed. + * @param stateWatermarkPredicate The state watermark predicate. See + *[[StreamingSymmetricHashJoinExec]] for further description of + *state watermarks. */ private class OneSideHashJoiner( joinSide: JoinSide, inputAttributes: Seq[Attribute], joinKeys: Seq[Expression], inputIter: Iterator[InternalRow], + preJoinFilterExpr: Option[Expression], + postJoinFilterExpr: Option[Expression], stateWatermarkPredicate: Option[JoinStateWatermarkPredicate]) { +// Filter the joined rows based on the given condition. +val preJoinFilter = + newPredicate(preJoinFilterExpr.getOrElse(Literal(true)), inputAttributes).eval _ +val postJoinFilter = newPredicate(postJoinFilterExpr.getOrElse(Literal(true)), output).eval _ --- End diff -- this is incorrect. the schema os the rows on which this filter will be applied is `left.output ++ right.output`. You need to apply another projection to put the JoinedRow in an UnsafeRow of the schema `output`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19499 **[Test build #82765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82765/testReport)** for PR 19499 at commit [`83cde8b`](https://github.com/apache/spark/commit/83cde8b2fcf1fb12567cd0bf7eef702186234a23). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19467: [SPARK-22238] Fix plan resolution bug caused by E...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19467 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19459 LGTM with few minor comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82764/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706910 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for row in data] return self._sc.parallelize(data), schema +def _createFromPandasWithArrow(self, df, schema): --- End diff -- nit: df -> pdf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #82764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82764/testReport)** for PR 19459 at commit [`f42e351`](https://github.com/apache/spark/commit/f42e35175969d8d7363e008a586a6f6982290447). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706853 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for row in data] return self._sc.parallelize(data), schema +def _createFromPandasWithArrow(self, df, schema): +""" +Create a DataFrame from a given pandas.DataFrame by slicing the into partitions, converting +to Arrow data, then reading into the JVM to parallelsize. If a schema is passed in, the +data types will be used to coerce the data in Pandas to Arrow conversion. +""" +import os +from tempfile import NamedTemporaryFile +from pyspark.serializers import ArrowSerializer +from pyspark.sql.types import from_arrow_schema, to_arrow_schema +import pyarrow as pa + +# Slice the DataFrame into batches +step = -(-len(df) // self.sparkContext.defaultParallelism) # round int up +df_slices = (df[start:start + step] for start in xrange(0, len(df), step)) +arrow_schema = to_arrow_schema(schema) if schema is not None else None +batches = [pa.RecordBatch.from_pandas(df_slice, schema=arrow_schema, preserve_index=False) + for df_slice in df_slices] + +# write batches to temp file, read by JVM (borrowed from context.parallelize) +tempFile = NamedTemporaryFile(delete=False, dir=self._sc._temp_dir) --- End diff -- This looks kind of duplicate with the main logic of `context.parallelize`. Maybe we can extract a common function from it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19467: [SPARK-22238] Fix plan resolution bug caused by EnsureSt...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/19467 Merging to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144706672 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row) for row in data] return self._sc.parallelize(data), schema +def _createFromPandasWithArrow(self, df, schema): +""" +Create a DataFrame from a given pandas.DataFrame by slicing the into partitions, converting --- End diff -- typo: slicing the. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19459 Thanks for the reviews @ueshin and @HyukjinKwon! I added `to_arrow_schema` conversion for when a schema is passed into `createDataFrame` and added some new tests to verify it. Please take another look when you can, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #82764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82764/testReport)** for PR 19459 at commit [`f42e351`](https://github.com/apache/spark/commit/f42e35175969d8d7363e008a586a6f6982290447). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19499 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82763/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19499 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19499 **[Test build #82763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82763/testReport)** for PR 19499 at commit [`b9c4954`](https://github.com/apache/spark/commit/b9c495490ca5b3ce07b413f9c4cc7b2f2e1d713b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19496 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19496 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82762/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19496 **[Test build #82762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19499 **[Test build #82763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82763/testReport)** for PR 19499 at commit [`b9c4954`](https://github.com/apache/spark/commit/b9c495490ca5b3ce07b413f9c4cc7b2f2e1d713b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19499: [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.co...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19499 [SPARK-22279][SQL][WIP] Turn on spark.sql.hive.convertMetastoreOrc by default ## What changes were proposed in this pull request? Like Parquet, this PR aims to turn on `spark.sql.hive.convertMetastoreOrc` by default. ## How was this patch tested? Pass all the existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22279 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19499.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19499 commit b9c495490ca5b3ce07b413f9c4cc7b2f2e1d713b Author: Dongjoon Hyun Date: 2017-10-14T18:49:27Z [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetastoreOrc by default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19496 **[Test build #82762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19485 Ok so I will do: - Create a new Section for csv-datasets - add more example options on the code fromJavaSQLDataSourceExample.java (.scala .py and .r) - Make reference to the links from the api. This will have the effect that we will not see all the options on .md page and people will need to jump in to the api. Do you agree with this ? Cool would be if from jekyllrb we could create something like a iframe and get the options from the scala api... Any ideias ? Please net me know if it is ok to proceed this way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Thanks for taking a look for this one. Actually, I thought we should add a chapter like http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets And, add a link to, for example, https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv for Python, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame for Scala and http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq- for Java to refer the options, rather than duplicating the option list (which we should duplicately update when we fix or add options). Probably, we should add some links to JSON ones too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82761/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #82761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82761/testReport)** for PR 19498 at commit [`f5a2a88`](https://github.com/apache/spark/commit/f5a2a884d860e9c8b3f98fc4ae5f10eaf3c1a0a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #82761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82761/testReport)** for PR 19498 at commit [`f5a2a88`](https://github.com/apache/spark/commit/f5a2a884d860e9c8b3f98fc4ae5f10eaf3c1a0a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19498 Hi @zsxwing, I happened to look into this one. Could you take a look and see if it makes sense please? cc @zero323 (reporter) too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to av...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19498 [SPARK-17756][PYTHON][STREAMING] Workaround to avoid return type mismatch in PythonTransformFunction ## What changes were proposed in this pull request? This PR proposes to wrap the transformed rdd within `TransformFunction`. `PythonTransformFunction` looks requiring to return `JavaRDD` in `_jrdd`. https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/python/pyspark/streaming/util.py#L67 https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala#L43 However, this could be `JavaPairRDD` by some APIs, for example, `zip` in PySpark's RDD API. `_jrdd` could be checked as below: ```python >>> rdd.zip(rdd)._jrdd.getClass().toString() u'class org.apache.spark.api.java.JavaPairRDD' ``` So, here, I wrapped it with `map` so that it ensures returning `JavaRDD`. ```python >>> rdd.zip(rdd).map(lambda x: x)._jrdd.getClass().toString() u'class org.apache.spark.api.java.JavaRDD' ``` I tried to elaborate some failure cases as below: ```python from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) ssc.queueStream([sc.range(10)]) \ .transform(lambda rdd: rdd.cartesian(rdd)) \ .pprint() ssc.start() ``` ```python from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) ssc.queueStream([sc.range(10)]).foreachRDD(lambda rdd: rdd.cartesian(rdd)) ssc.start() ``` ```python from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) ssc.queueStream([sc.range(10)]).foreachRDD(lambda rdd: rdd.zip(rdd)) ssc.start() ``` ```python from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) ssc.queueStream([sc.range(10)]).foreachRDD(lambda rdd: rdd.zip(rdd).union(rdd.zip(rdd))) ssc.start() ``` ```python from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) ssc.queueStream([sc.range(10)]).foreachRDD(lambda rdd: rdd.zip(rdd).coalesce(1)) ssc.start() ``` ## How was this patch tested? Unit tests were added in `python/pyspark/streaming/tests.py` and manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17756 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19498.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19498 commit f5a2a884d860e9c8b3f98fc4ae5f10eaf3c1a0a4 Author: hyukjinkwon Date: 2017-10-14T13:50:49Z Workaround to avoid return type mispatch in PythonTransformFunction --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19442 @MLnick Can you take a look and give me some suggestion? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19442 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82760/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19442 **[Test build #82760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82760/testReport)** for PR 19442 at commit [`51440b4`](https://github.com/apache/spark/commit/51440b4eeefece4f899a21ef1a0a63399a9f95ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19442 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19442 **[Test build #82760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82760/testReport)** for PR 19442 at commit [`51440b4`](https://github.com/apache/spark/commit/51440b4eeefece4f899a21ef1a0a63399a9f95ac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82758/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19480 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #82758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82758/testReport)** for PR 19480 at commit [`37506dc`](https://github.com/apache/spark/commit/37506dcc380cf5c14ea929b33f9e8e26efdbcb8d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19419: [SPARK-22188] [CORE] Adding security headers for prevent...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19419 **[Test build #3947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3947/testReport)** for PR 19419 at commit [`5c76b91`](https://github.com/apache/spark/commit/5c76b914ecbd7fd82276496151f7ed89fe519025). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19442 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19442 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82759/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19442 **[Test build #82759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82759/testReport)** for PR 19442 at commit [`2b94dd5`](https://github.com/apache/spark/commit/2b94dd5c192b1d9302e24c0392fc9a5aaaedb596). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82757/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82757/testReport)** for PR 19222 at commit [`6e8d5b8`](https://github.com/apache/spark/commit/6e8d5b820c83517d0340d748959b855229e664a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19442 **[Test build #82759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82759/testReport)** for PR 19442 at commit [`2b94dd5`](https://github.com/apache/spark/commit/2b94dd5c192b1d9302e24c0392fc9a5aaaedb596). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/19442 [SPARK-8515][ML][WIP] Improve ML Attribute API ## What changes were proposed in this pull request? The current ML attribute API has issues like inefficiency and not easy to use. This work tries to improve this API with main changes: * Support spark vector-typed attributes. * Simplify vector-typed attribute serialization. * Keep minimum APIs to support ML attributes. ** THIS WORK is not ready and is working in progress. ## How was this patch tested? Added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-8515 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19442.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19442 commit 77d657d8bc8102081e4b0d7b5d42a256e64514d4 Author: Liang-Chi Hsieh Date: 2017-10-02T15:03:54Z Init design of ml attribute. commit 7837778e7cbbf83851b1a2b5047f4e6a8039f809 Author: Liang-Chi Hsieh Date: 2017-10-03T15:03:31Z revise. commit 97f6848f0cbb1a76b4434930ce8938da50eaafbe Author: Liang-Chi Hsieh Date: 2017-10-03T15:14:02Z revise. commit 2e3a3541fc7a59ac63b2118228de8015c238de40 Author: Liang-Chi Hsieh Date: 2017-10-04T05:15:58Z revise. commit 0d76eac84f5837aefebc763687fa9c5c7e1aeb4d Author: Liang-Chi Hsieh Date: 2017-10-04T15:07:57Z revise. commit 81cca5cccfa2556ff0bba5a73764d3f503040b13 Author: Liang-Chi Hsieh Date: 2017-10-05T04:30:48Z revise. commit 4813fe8a4bd19a02b7b6bff138f04e7e50f7cdd7 Author: Liang-Chi Hsieh Date: 2017-10-05T06:15:53Z revise. commit 7951f59027418962ad95465e439bff41876ecfa8 Author: Liang-Chi Hsieh Date: 2017-10-05T07:51:50Z revise. commit a381af3edf52132086af64360789cb3a7d20d61e Author: Liang-Chi Hsieh Date: 2017-10-05T09:00:02Z Add builder and test. commit f25c89dbded0eb9dce25d8da63a1a1aa49ad459f Author: Liang-Chi Hsieh Date: 2017-10-05T15:10:11Z revise test. commit 7e237f38088f2375f40f9a4c97aee2e6acd54328 Author: Liang-Chi Hsieh Date: 2017-10-06T02:46:07Z Add new test. commit 77ced957e7be2169ac0c59c76f60ab9d4fcac3ef Author: Liang-Chi Hsieh Date: 2017-10-06T03:57:12Z Add more tests. commit de0aa76199141255258d9d5b12a0d31b1758c6f1 Author: Liang-Chi Hsieh Date: 2017-10-06T06:17:29Z revise. commit d828cf3d3b13a2b2b1990bdff9593b49e53f6cf9 Author: Liang-Chi Hsieh Date: 2017-10-06T13:55:41Z Add java-friendly APIs for attribute types. commit 5844fbaef5d5825eafadb7c53196fb2132937e4e Author: Liang-Chi Hsieh Date: 2017-10-09T03:24:26Z Revise APIs. commit da0fcef7d3370ebca97d200f01e9f2814a9ed755 Author: Liang-Chi Hsieh Date: 2017-10-09T03:26:15Z revise. commit 66be26cd7f25614137cfb9722f859f36d9f80c0c Author: Liang-Chi Hsieh Date: 2017-10-09T03:47:43Z Add default constructors to attribute types. commit ce80ed5b693745fa4a650e508c6cd9e24350c52e Author: Liang-Chi Hsieh Date: 2017-10-10T12:52:22Z Use Array instead of Seq in APIs. commit 2b94dd5c192b1d9302e24c0392fc9a5aaaedb596 Author: Liang-Chi Hsieh Date: 2017-10-14T00:21:04Z Add more compatibility tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82756/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82756/testReport)** for PR 19472 at commit [`150e0a3`](https://github.com/apache/spark/commit/150e0a30ac4ed11c783d62d47c4404c854b03dd9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82754/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)** for PR 19497 at commit [`a319df3`](https://github.com/apache/spark/commit/a319df36db5bd202a14b44a09e9d1887f1633aec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19496 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19496 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82755/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19496 **[Test build #82755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19494: [SPARK-22249][SQL] isin with empty list throws ex...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19494#discussion_r144690815 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -104,7 +104,8 @@ case class InMemoryTableScanExec( case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) => list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] && -l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _) +l.asInstanceOf[Literal] <= statsFor(a).upperBound) --- End diff -- It was a mistake, sorry. It returned always `false`. I see what you mean, but in this piece of code we are only building the `Expression` and we are not evaluating it. Thus it is not possible to short-circuit, because the `Expression` must be built entirely. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #82758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82758/testReport)** for PR 19480 at commit [`37506dc`](https://github.com/apache/spark/commit/37506dcc380cf5c14ea929b33f9e8e26efdbcb8d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19419: [SPARK-22188] [CORE] Adding security headers for prevent...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19419 **[Test build #3947 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3947/testReport)** for PR 19419 at commit [`5c76b91`](https://github.com/apache/spark/commit/5c76b914ecbd7fd82276496151f7ed89fe519025). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144689872 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,6 @@ Apart from these, the following properties are also available, and may be useful - --- End diff -- If you have to change the pull request again, I'd revert this, but no need to change it only for this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144689866 --- Diff: docs/security.md --- @@ -186,7 +186,52 @@ configure those ports. +### HTTP Security Headers + +Apache Spark can be configured to include HTTP Headers which aids in preventing Cross +Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also enforces HTTP +Strict Transport Security. + + +Property NameDefaultMeaning + +spark.ui.xXssProtection +None + +Value for HTTP X-XSS-Protection response header. You can choose appropriate value +from below: + --- End diff -- Why not just leave this as a bulleted list? Not a big deal I guess just less conventional for HTML --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19464 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19464 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19496#discussion_r144689475 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala --- @@ -80,7 +80,8 @@ case class Average(child: Expression) extends DeclarativeAggregate with Implicit case DecimalType.Fixed(p, s) => // increase the precision and scale to prevent precision loss val dt = DecimalType.bounded(p + 14, s + 4) - Cast(Cast(sum, dt) / Cast(count, dt), resultType) + Cast(Cast(sum, dt) / Cast(count, DecimalType.bounded (DecimalType.MAX_PRECISION, 0)), --- End diff -- No need to add space after `bounded`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144687101 --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala --- @@ -510,4 +510,83 @@ class FileSuite extends SparkFunSuite with LocalSparkContext { } } + test("spark.files.ignoreEmptySplits work correctly (old Hadoop API)") { +val conf = new SparkConf() +conf.setAppName("test").setMaster("local").set(IGNORE_EMPTY_SPLITS, true) +sc = new SparkContext(conf) + +def testIgnoreEmptySplits( +data: Array[Tuple2[String, String]], +actualPartitionNum: Int, +expectedPartitionNum: Int): Unit = { + val output = new File(tempDir, "output") + sc.parallelize(data, actualPartitionNum) +.saveAsHadoopFile[TextOutputFormat[String, String]](output.getPath) + for (i <- 0 until actualPartitionNum) { +assert(new File(output, s"part-$i").exists() === true) + } + val hadoopRDD = sc.textFile(new File(output, "part-*").getPath) + assert(hadoopRDD.partitions.length === expectedPartitionNum) + Utils.deleteRecursively(output) +} + +// Ensure that if all of the splits are empty, we remove the splits correctly +testIgnoreEmptySplits( + data = Array.empty[Tuple2[String, String]], + actualPartitionNum = 1, + expectedPartitionNum = 0) + +// Ensure that if no split is empty, we don't lose any splits +testIgnoreEmptySplits( + data = Array(("key1", "a"), ("key2", "a"), ("key3", "b")), + actualPartitionNum = 2, + expectedPartitionNum = 2) + +// Ensure that if part of the splits are empty, we remove the splits correctly +testIgnoreEmptySplits( + data = Array(("key1", "a"), ("key2", "a")), + actualPartitionNum = 5, + expectedPartitionNum = 2) + } + + test("spark.files.ignoreEmptySplits work correctly (new Hadoop API)") { +val conf = new SparkConf() +conf.setAppName("test").setMaster("local").set(IGNORE_EMPTY_SPLITS, true) +sc = new SparkContext(conf) + +def testIgnoreEmptySplits( +data: Array[Tuple2[String, String]], +actualPartitionNum: Int, +expectedPartitionNum: Int): Unit = { + val output = new File(tempDir, "output") + sc.parallelize(data, actualPartitionNum) +.saveAsNewAPIHadoopFile[NewTextOutputFormat[String, String]](output.getPath) + for (i <- 0 until actualPartitionNum) { +assert(new File(output, s"part-r-$i").exists() === true) + } + val hadoopRDD = sc.newAPIHadoopFile(new File(output, "part-r-*").getPath, +classOf[NewTextInputFormat], classOf[LongWritable], classOf[Text]) +.asInstanceOf[NewHadoopRDD[_, _]] --- End diff -- nit: ```scala val hadoopRDD = sc.newAPIHadoopFile( new File(output, "part-r-*").getPath, classOf[NewTextInputFormat], classOf[LongWritable], classOf[Text]).asInstanceOf[NewHadoopRDD[_, _]] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19494: [SPARK-22249][SQL] isin with empty list throws ex...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19494#discussion_r144689325 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -104,7 +104,8 @@ case class InMemoryTableScanExec( case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) => list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] && -l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _) +l.asInstanceOf[Literal] <= statsFor(a).upperBound) --- End diff -- I see. How does `.contains(true)` work then? or did that not work? I suppose all I mean is that we should write something that works on an empty list (returns false?) and also short-circuits (stops when anything is true). Is that possible? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82757/testReport)** for PR 19222 at commit [`6e8d5b8`](https://github.com/apache/spark/commit/6e8d5b820c83517d0340d748959b855229e664a7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144688592 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)), Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2))) } + + test("SPARK-6: splitExpressions should not generate codes beyond 64KB") { +val colNumber = 1 +val input = spark.range(2).rdd.map(_ => Row(1 to colNumber: _*)) +val df = sqlContext.createDataFrame(input, StructType( + (1 to colNumber).map(colIndex => StructField(s"_$colIndex", IntegerType, false +val newCols = (1 to colNumber).flatMap { colIndex => + Seq(expr(s"if(1000 < _$colIndex, 1000, _$colIndex)"), +expr(s"sqrt(_$colIndex)")) +} +df.select(newCols: _*).collect() + } + + test("SPARK-6: too many splitted expressions should not exceed constant pool limit") { --- End diff -- Btw, since this test didn't test what we want to test. We should remove it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144688579 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)), Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2))) } + + test("SPARK-6: splitExpressions should not generate codes beyond 64KB") { +val colNumber = 1 +val input = spark.range(2).rdd.map(_ => Row(1 to colNumber: _*)) +val df = sqlContext.createDataFrame(input, StructType( + (1 to colNumber).map(colIndex => StructField(s"_$colIndex", IntegerType, false +val newCols = (1 to colNumber).flatMap { colIndex => + Seq(expr(s"if(1000 < _$colIndex, 1000, _$colIndex)"), +expr(s"sqrt(_$colIndex)")) +} +df.select(newCols: _*).collect() + } + + test("SPARK-6: too many splitted expressions should not exceed constant pool limit") { --- End diff -- The unit test added into `CodeGenerationSuite` looks sufficient for identifying this particular issue regarding constant pool limit in outer class due to too many method calls. It is hard to contrive an end-to-end test so far purely for reproducing this particular issue. At least I failed to contrive one after several tries. So let wait if anyone has the chance or insights to create one. If no, I think the unit case in `CodeGenerationSuite` should be good enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 @tejasapatil I updated performance results for operations that more used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82756/testReport)** for PR 19472 at commit [`150e0a3`](https://github.com/apache/spark/commit/150e0a30ac4ed11c783d62d47c4404c854b03dd9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19496 **[Test build #82755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19496 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 Let me take a look with few tests and be back. Also I think I should cc @jiangxb1987 too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)** for PR 19497 at commit [`a319df3`](https://github.com/apache/spark/commit/a319df36db5bd202a14b44a09e9d1887f1633aec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144688322 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)), Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2))) } + + test("SPARK-6: splitExpressions should not generate codes beyond 64KB") { +val colNumber = 1 +val input = spark.range(2).rdd.map(_ => Row(1 to colNumber: _*)) +val df = sqlContext.createDataFrame(input, StructType( + (1 to colNumber).map(colIndex => StructField(s"_$colIndex", IntegerType, false +val newCols = (1 to colNumber).flatMap { colIndex => + Seq(expr(s"if(1000 < _$colIndex, 1000, _$colIndex)"), +expr(s"sqrt(_$colIndex)")) +} +df.select(newCols: _*).collect() + } + + test("SPARK-6: too many splitted expressions should not exceed constant pool limit") { --- End diff -- You are right @viirya. Sorry, I didn't notice. Yes the problem is that most of the times we have both these issues at the moment, thus solving one is not enough. It turns out that there are some corner cases in which this fix is enough, like the real case I am working on. But it is not easy to reproduce them in a simple way. In this use case there are a lot of complex projections a `dropDuplicate` and some joins after that. But there are query made of thousands of lines of SQL code. The only way I have been able to reproduce it easily is in this test case: https://github.com/apache/spark/pull/19480/files#r144302922. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82753/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82753/testReport)** for PR 19497 at commit [`a319df3`](https://github.com/apache/spark/commit/a319df36db5bd202a14b44a09e9d1887f1633aec). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org