[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84018/testReport)** for PR 19776 at commit [`635768e`](https://github.com/apache/spark/commit/635768e65542cf66f5db92ccf5088136a55443f5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Thanks. Just did that as you suggested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- Usually we don't list PR number but just JIRA number is enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- Sure. I have added more comments there with an example. Thanks, Sean! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19787: [SPARK-22541][SQL] Explicitly claim that Python udfs can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19787 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84016/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19787: [SPARK-22541][SQL] Explicitly claim that Python udfs can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19787 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19787: [SPARK-22541][SQL] Explicitly claim that Python udfs can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19787 **[Test build #84016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84016/testReport)** for PR 19787 at commit [`3b69777`](https://github.com/apache/spark/commit/3b69777924d0ac54bc4b6ec9c740cb20774bf033). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/19788 [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs ## What changes were proposed in this pull request? Optimize shuffle fetch of contiguous partition IDs ## How was this patch tested? Add new unit tests, AdaptiveSchedulingSuite, BlockIdSuite etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yucai/spark shuffle_fetch_opt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19788.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19788 commit e947bcb39d1436751e68e033871baba5a77f6d0e Author: yucaiDate: 2017-11-20T03:04:28Z [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84017/testReport)** for PR 19776 at commit [`e540790`](https://github.com/apache/spark/commit/e5407902c92b3936395ae1d38eba8bf20dfcbb43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151917654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- In the comment, you need to give an example to explain why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user DaimonPl commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r151917066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -961,6 +961,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val NESTED_SCHEMA_PRUNING_ENABLED = +buildConf("spark.sql.nestedSchemaPruning.enabled") + .internal() + .doc("Prune nested fields from a logical relation's output which are unnecessary in " + +"satisfying a query. This optimization allows columnar file format readers to avoid " + +"reading unnecessary nested column data.") + .booleanConf + .createWithDefault(true) --- End diff -- So maybe at least make it true for some core sql/parquet test suits? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151916997 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Yeah. Follow what @yhuai wrote in the PR https://github.com/apache/spark/pull/10362/files --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19787: [SPARK-22541][SQL] Explicitly claim that Python udfs can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19787 **[Test build #84016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84016/testReport)** for PR 19787 at commit [`3b69777`](https://github.com/apache/spark/commit/3b69777924d0ac54bc4b6ec9c740cb20774bf033). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19787: [SPARK-22541][SQL] Explicitly claim that Python udfs can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19787 cc @HyukjinKwon @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19787: [SPARK-22541][SQL] Explicitly claim that Python u...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19787 [SPARK-22541][SQL] Explicitly claim that Python udfs can't be conditional executed with short-curcuit evaluation ## What changes were proposed in this pull request? Besides conditional expressions such as `when` and `if`, users may want to conditionally execute python udfs by short-curcuit evaluation. We should also explicitly note that python udfs don't support this kind of conditional execution too. ## How was this patch tested? N/A, just document change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22541 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19787.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19787 commit 3b69777924d0ac54bc4b6ec9c740cb20774bf033 Author: Liang-Chi HsiehDate: 2017-11-20T07:13:32Z Add document for udf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19786 **[Test build #84015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84015/testReport)** for PR 19786 at commit [`054cd7e`](https://github.com/apache/spark/commit/054cd7e208c908aa50cc8030b15feee9d2ce71e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/19786 [SPARK-22559][CORE]history server: handle exception on opening corrupted listing.ldb ## What changes were proposed in this pull request? Currently history server v2 failed to start if `listing.ldb` is corrupted. This patch get rid of the corrupted `listing.ldb` and re-create it. The exception handling follows [opening disk store for app](https://github.com/apache/spark/blob/0ffa7c488fa8156e2a1aa282e60b7c36b86d8af8/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L307) ## How was this patch tested? manual test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark listingException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19786.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19786 commit 054cd7e208c908aa50cc8030b15feee9d2ce71e9 Author: Wang GengliangDate: 2017-11-20T06:59:40Z history server: handle exception on opening corrupted listing.ldb --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84010/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19776 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84010/testReport)** for PR 19776 at commit [`58de88c`](https://github.com/apache/spark/commit/58de88c21210d469b8ef14b1f23764c31ca5651e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19773: [SPARK-22546][SQL] Supporting for changing column dataTy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19773 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19773: [SPARK-22546][SQL] Supporting for changing column dataTy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19773 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84012/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19773: [SPARK-22546][SQL] Supporting for changing column dataTy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19773 **[Test build #84012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84012/testReport)** for PR 19773 at commit [`b145102`](https://github.com/apache/spark/commit/b145102c9eeccb91b7d818915b11429807099fbb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/10572 @HyukjinKwon To merge small files, should I tune `spark.sql.files.maxPartitionBytes`? But IIUC it only works for `FileSourceScanExec`. So when I select from hive table, it doesn't work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84011/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #84011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84011/testReport)** for PR 19779 at commit [`9beb53f`](https://github.com/apache/spark/commit/9beb53f0c92a5033007f4ab6f43d8fd94264a0e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19750: [SPARK-20650][core] Remove JobProgressListener.
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/19750 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19769#discussion_r151909614 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java --- @@ -298,7 +304,10 @@ private void decodeDictionaryIds( // TODO: Convert dictionary of Binaries to dictionary of Longs if (!column.isNullAt(i)) { Binary v = dictionary.decodeToBinary(dictionaryIds.getDictId(i)); - column.putLong(i, ParquetRowConverter.binaryToSQLTimestamp(v)); + long rawTime = ParquetRowConverter.binaryToSQLTimestamp(v); + long adjTime = + convertTz == null ? rawTime : DateTimeUtils.convertTz(rawTime, convertTz, UTC); --- End diff -- How about skipping conversion if `convertTz == UTC` as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84009/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #84009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84009/testReport)** for PR 19780 at commit [`64e93ec`](https://github.com/apache/spark/commit/64e93ec47a4516a341ad9f39dab02266ce7b6e4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19769#discussion_r151908974 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala --- @@ -87,4 +95,109 @@ class ParquetInteroperabilitySuite extends ParquetCompatibilityTest with SharedS Row(Seq(2, 3 } } + + test("parquet timestamp conversion") { +// Make a table with one parquet file written by impala, and one parquet file written by spark. +// We should only adjust the timestamps in the impala file, and only if the conf is set +val impalaFile = "test-data/impala_timestamp.parq" + +// here are the timestamps in the impala file, as they were saved by impala +val impalaFileData = + Seq( +"2001-01-01 01:01:01", +"2002-02-02 02:02:02", +"2003-03-03 03:03:03" + ).map(java.sql.Timestamp.valueOf) +val impalaPath = Thread.currentThread().getContextClassLoader.getResource(impalaFile) + .toURI.getPath +withTempPath { tableDir => + val ts = Seq( +"2004-04-04 04:04:04", +"2005-05-05 05:05:05", +"2006-06-06 06:06:06" + ).map { s => java.sql.Timestamp.valueOf(s) } + import testImplicits._ + // match the column names of the file from impala + val df = spark.createDataset(ts).toDF().repartition(1).withColumnRenamed("value", "ts") + df.write.parquet(tableDir.getAbsolutePath) + FileUtils.copyFile(new File(impalaPath), new File(tableDir, "part-1.parq")) + + Seq(false, true).foreach { int96TimestampConversion => +Seq(false, true).foreach { vectorized => + withSQLConf( + (SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key, +SQLConf.ParquetOutputTimestampType.INT96.toString), + (SQLConf.PARQUET_INT96_TIMESTAMP_CONVERSION.key, int96TimestampConversion.toString()), + (SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, vectorized.toString()) + ) { +val readBack = spark.read.parquet(tableDir.getAbsolutePath).collect() +assert(readBack.size === 6) +// if we apply the conversion, we'll get the "right" values, as saved by impala in the +// original file. Otherwise, they're off by the local timezone offset, set to +// America/Los_Angeles in tests +val impalaExpectations = if (int96TimestampConversion) { + impalaFileData +} else { + impalaFileData.map { ts => +DateTimeUtils.toJavaTimestamp(DateTimeUtils.convertTz( + DateTimeUtils.fromJavaTimestamp(ts), + DateTimeUtils.TimeZoneUTC, + DateTimeUtils.getTimeZone(conf.sessionLocalTimeZone))) + } +} +val fullExpectations = (ts ++ impalaExpectations).map(_.toString).sorted.toArray +val actual = readBack.map(_.getTimestamp(0).toString).sorted +withClue(s"applyConversion = $int96TimestampConversion; vectorized = $vectorized") { --- End diff -- nit: use `int96TimestampConversion =` instead of `applyConversion =` too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19785: [MINOR][doc] The left navigation bar should be fixed wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19785 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19785: [MINOR][doc] The left navigation bar should be fixed wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84014/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19785: [MINOR][doc] The left navigation bar should be fixed wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19785 **[Test build #84014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84014/testReport)** for PR 19785 at commit [`9e1f72a`](https://github.com/apache/spark/commit/9e1f72a6db104414dae8f614777f415811569d6c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19741: [SPARK-14228][CORE][YARN] Lost executor of RPC disassoci...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19741 **[Test build #84007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84007/testReport)** for PR 19741 at commit [`930ba79`](https://github.com/apache/spark/commit/930ba795fa19a8158174de153351738a64fbcb2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19741: [SPARK-14228][CORE][YARN] Lost executor of RPC disassoci...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19741 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84007/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19741: [SPARK-14228][CORE][YARN] Lost executor of RPC disassoci...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19741 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19785: [MINOR][doc] The left navigation bar should be fixed wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19785 **[Test build #84014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84014/testReport)** for PR 19785 at commit [`9e1f72a`](https://github.com/apache/spark/commit/9e1f72a6db104414dae8f614777f415811569d6c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19785: [MINOR][doc] The left navigation bar should be fi...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/19785 [MINOR][doc] The left navigation bar should be fixed with respect to scrolling. ## What changes were proposed in this pull request? A minor CSS style change to make Left navigation bar stay fixed with respect to scrolling, it improves usability of the docs. ## How was this patch tested? It was tested on both, firefox and chrome. ### Before ![a2](https://user-images.githubusercontent.com/992952/33004206-6acf9fc0-cde5-11e7-9070-02f26f7899b0.gif) ### After ![a1](https://user-images.githubusercontent.com/992952/33004205-69b27798-cde5-11e7-8002-509b29786b37.gif) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark doc/css Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19785.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19785 commit 9e1f72a6db104414dae8f614777f415811569d6c Author: Prashant SharmaDate: 2017-11-20T05:50:16Z [MINOR][doc] The left navigation bar should be fixed with respect to scrolling. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19784 Oh, thank you so much, @HyukjinKwon , @srowen , and @jiangxb1987 . Yes. This will make our build more stable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19607 Seems there is no explicit objection for dropping it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #84013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84013/testReport)** for PR 19607 at commit [`9c94f90`](https://github.com/apache/spark/commit/9c94f90a703daaf08887259c1757420477a95b94). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84008/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #84008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84008/testReport)** for PR 19780 at commit [`3742f22`](https://github.com/apache/spark/commit/3742f22f7276b6e8416c745cfc2288cd652e9294). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19773: [SPARK-22546][SQL] Supporting for changing column dataTy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19773 **[Test build #84012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84012/testReport)** for PR 19773 at commit [`b145102`](https://github.com/apache/spark/commit/b145102c9eeccb91b7d818915b11429807099fbb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #84011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84011/testReport)** for PR 19779 at commit [`9beb53f`](https://github.com/apache/spark/commit/9beb53f0c92a5033007f4ab6f43d8fd94264a0e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19776 @jliwork Let's see what @cloud-fan @felixcheung think about it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r151902033 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala --- @@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc) val fileSinkConfSer = fileSinkConf new OutputWriterFactory { private val jobConf = new SerializableJobConf(new JobConf(conf)) + private val broadcastHadoopConf = sparkSession.sparkContext.broadcast( --- End diff -- Thanks for the comment, I'll change code to use jobConf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user jliwork commented on the issue: https://github.com/apache/spark/pull/19776 @viirya I'm fine with backport to 2.2 unless anyone objects. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19763#discussion_r151901370 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -485,4 +485,13 @@ package object config { "array in the sorter.") .intConf .createWithDefault(Integer.MAX_VALUE) + + private[spark] val SHUFFLE_MAP_OUTPUT_STATISTICS_MULTITHREAD_THRESHOLD = +ConfigBuilder("spark.shuffle.mapOutputStatisticsMultithreadThreshold") --- End diff -- `spark.shuffle.mapOutputStatistics.parallelAggregationThreshold`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151901160 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Sure. Will do. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19782: [SPARK-22554][PYTHON] Add a config to control if ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19782 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19776 This affects correctness, should we also backport to 2.2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19782 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19784 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19782 Thank you @viirya and @ueshin. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19784 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151900595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Let's add a small comment like the PR you pointed out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10468: [SPARK-12409][SPARK-12387][SPARK-12391][SQL] Supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10468#discussion_r151900175 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -186,8 +187,26 @@ class JDBCSuite extends SparkFunSuite assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME = 'fred'")).collect().size == 1) assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME > 'fred'")).collect().size == 2) assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME != 'fred'")).collect().size == 2) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME IN ('mary', 'fred')")) + .collect().size == 2) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME NOT IN ('fred')")) + .collect().size === 2) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE THEID = 1 OR NAME = 'mary'")) + .collect().size == 2) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE THEID = 1 OR NAME = 'mary' " + + "AND THEID = 2")).collect().size == 2) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME LIKE 'fr%'")).collect().size == 1) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME LIKE '%ed'")).collect().size == 1) +assert(stripSparkFilter(sql("SELECT * FROM foobar WHERE NAME LIKE '%re%'")).collect().size == 1) assert(stripSparkFilter(sql("SELECT * FROM nulltypes WHERE A IS NULL")).collect().size == 1) assert(stripSparkFilter(sql("SELECT * FROM nulltypes WHERE A IS NOT NULL")).collect().size == 0) + +// This is a test to reflect discussion in SPARK-12218. +// The older versions of spark have this kind of bugs in parquet data source. +val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") --- End diff -- The two sub-conditions are both ok to be pushed down. So this doesn't actually test against the nested AND issue in SPARK-12218. See #19776 Btw, the two sub-conditions are filtered out the same rows. This doesn't reflect the issue too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19782 LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84010/testReport)** for PR 19776 at commit [`58de88c`](https://github.com/apache/spark/commit/58de88c21210d469b8ef14b1f23764c31ca5651e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19782 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19776 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user jliwork commented on the issue: https://github.com/apache/spark/pull/19776 @viirya Thanks for letting me know, Simon. I've fixed the title. Can someone please help trigger the tests please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19776 @jliwork Can you fix the PR title? The title is cut when pasting on. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19370 I think the concern is we are adding python dependency for even non python use cases. We could track handling missing python better separately. My concerns is python is not a standard component on Windows and the user might not have it installed for using spark-submit or spark-shell. And the message might not be obvious as to why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19776 Looks like a bug and be there for a long while. cc @cloud-fan @HyukjinKwon can you help trigger the test? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19784 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84005/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19784 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TEST] Use ThreadSignaler explicitly
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19784 **[Test build #84005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84005/testReport)** for PR 19784 at commit [`a8162a7`](https://github.com/apache/spark/commit/a8162a7a2ed48e227ddff8c20615616e31ab8821). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UnpersistSuite extends SparkFunSuite with LocalSparkContext with TimeLimits ` * `class ProcessingTimeExecutorSuite extends SparkFunSuite with TimeLimits ` * `class BlockGeneratorSuite extends SparkFunSuite with BeforeAndAfter with TimeLimits ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84006/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19388 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19388 **[Test build #84006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84006/testReport)** for PR 19388 at commit [`66192fd`](https://github.com/apache/spark/commit/66192fd61dae1489216fffcea0f104800aefbd31). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #84009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84009/testReport)** for PR 19780 at commit [`64e93ec`](https://github.com/apache/spark/commit/64e93ec47a4516a341ad9f39dab02266ce7b6e4d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19671: [SPARK-22297][CORE TESTS] Flaky test: BlockManagerSuite ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19671 @vanzin How often does this test case fail on your local environment? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19770: [SPARK-21571][WEB UI] Spark history server leaves incomp...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19770 Please fix the scala style failure, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19761: [SPARK-22479][SQL][BRANCH-2.2] Exclude credentials from ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19761 ping @onursatici --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #84008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84008/testReport)** for PR 19780 at commit [`3742f22`](https://github.com/apache/spark/commit/3742f22f7276b6e8416c745cfc2288cd652e9294). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19741: [SPARK-14228][CORE][YARN] Lost executor of RPC disassoci...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19741 **[Test build #84007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84007/testReport)** for PR 19741 at commit [`930ba79`](https://github.com/apache/spark/commit/930ba795fa19a8158174de153351738a64fbcb2c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19774#discussion_r151892566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -689,6 +689,11 @@ case class DescribeColumnCommand( buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL")) buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL")) buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL")) + buffer ++= cs.flatMap(_.histogram.map { hist => +val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}") +Seq(header) ++ hist.bins.map(bin => + Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}")) --- End diff -- And how about making each class member as a separate field? ``` histogram height: 2.0 num_of_bins: 2 histogram_bin lower_bound: 1.0upper_bound: 2.0 distinct_count: 2 histogram_bin lower_bound: 2.0upper_bound: 4.0 distinct_count: 2 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19774#discussion_r151892351 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -689,6 +689,11 @@ case class DescribeColumnCommand( buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL")) buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL")) buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL")) + buffer ++= cs.flatMap(_.histogram.map { hist => +val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}") +Seq(header) ++ hist.bins.map(bin => + Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}")) --- End diff -- @mgaido91 @jaceklaskowski How about using "histogram_bin" instead of the empty string? ``` histogram height: 2.0, num_of_bins: 2 histogram_bin lower_bound: 1.0, upper_bound: 2.0, distinct_count: 2 histogram_bin lower_bound: 2.0, upper_bound: 4.0, distinct_count: 2 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r151890602 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -158,8 +196,8 @@ class FilterEstimationSuite extends StatsEstimationTestBase { val condition = Not(And(LessThan(attrInt, Literal(3)), Literal(null, IntegerType))) validateEstimatedStats( Filter(condition, childStatsTestPlan(Seq(attrInt), 10L)), - Seq(attrInt -> colStatInt.copy(distinctCount = 8)), - expectedRowCount = 8) + Seq(attrInt -> colStatInt.copy(distinctCount = 7)), --- End diff -- Shall we add new test cases for filter estimation based on histogram, instead of modifying existing test results? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user Huamei-17 commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r151890487 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala --- @@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc) val fileSinkConfSer = fileSinkConf new OutputWriterFactory { private val jobConf = new SerializableJobConf(new JobConf(conf)) + private val broadcastHadoopConf = sparkSession.sparkContext.broadcast( --- End diff -- Is it possible to use jobConf as hive serde initialize param directly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19783 cc @sameeragarwal @cloud-fan @gatorsmile @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19388 Thank you very much, @dongjoon-hyun. I forgot to pushed the fix for streaming failure. @vanzin My understanding based on reading the code before this change is that the logic behind it avoids multiple committing of a single stage. So I think based on that logic it should use stageId somehow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19388 **[Test build #84006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84006/testReport)** for PR 19388 at commit [`66192fd`](https://github.com/apache/spark/commit/66192fd61dae1489216fffcea0f104800aefbd31). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19370#discussion_r151885502 --- Diff: bin/find-spark-home.cmd --- @@ -0,0 +1,49 @@ +@echo off + +rem +rem Licensed to the Apache Software Foundation (ASF) under one or more +rem contributor license agreements. See the NOTICE file distributed with +rem this work for additional information regarding copyright ownership. +rem The ASF licenses this file to You under the Apache License, Version 2.0 +rem (the "License"); you may not use this file except in compliance with +rem the License. You may obtain a copy of the License at +rem +remhttp://www.apache.org/licenses/LICENSE-2.0 +rem +rem Unless required by applicable law or agreed to in writing, software +rem distributed under the License is distributed on an "AS IS" BASIS, +rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +rem See the License for the specific language governing permissions and +rem limitations under the License. +rem + +rem Path to Python script finding SPARK_HOME +set FIND_SPARK_HOME_PYTHON_SCRIPT=%~dp0find_spark_home.py --- End diff -- I manually tested and it looks going to give a message like this: ```cmd C:\...>pyspark 'python' is not recognized as an internal or external command, operable program or batch file. ``` which seems roughly fine though it's bit uglily. So, googled a possible approach, for example, https://superuser.com/a/718194. However, seems `where` does not recognise an absolute path, for example, `C:\...\Python27\python.exe`. So, looks we should make a combination with `exists` keyword. @jsnowacki if you are active now and know a simple better way, we could definitely try. Or, probably, we could go ahead as is too .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19370 Yup, +1 for going ahead with branch-2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19784: [SPARK-22557][TESTS] Use ThreadSignaler explicitly
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19784 **[Test build #84005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84005/testReport)** for PR 19784 at commit [`a8162a7`](https://github.com/apache/spark/commit/a8162a7a2ed48e227ddff8c20615616e31ab8821). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19784: [SPARK-22557][TESTS] Use ThreadSignaler explicitl...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19784 [SPARK-22557][TESTS] Use ThreadSignaler explicitly ## What changes were proposed in this pull request? ScalaTest 3.0 uses an implicit `Signaler`. This PR makes it sure all Spark tests uses `ThreadSignaler` explicitly which has the same default behavior of interrupting a thread on the JVM like ScalaTest 2.2.x. This will reduce potential flakiness. ## How was this patch tested? This is testsuite-only update. This should passes the Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark use_thread_signaler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19784.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19784 commit a8162a7a2ed48e227ddff8c20615616e31ab8821 Author: Dongjoon HyunDate: 2017-11-20T00:06:05Z [SPARK-22557][TESTS] Use ThreadSignaler explicitly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84004/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19370 **[Test build #84004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84004/testReport)** for PR 19370 at commit [`c0138a9`](https://github.com/apache/spark/commit/c0138a9c2542d045f0419345bd4fc171baf4c107). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84003/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19783 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19783 **[Test build #84003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84003/testReport)** for PR 19783 at commit [`dd5b975`](https://github.com/apache/spark/commit/dd5b975dafdf9fc4edd94cf6e369f5e899db74e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org