[GitHub] spark issue #18892: [SPARK-21520][SQL]Improvement a special case for non-det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18892 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18892: [SPARK-21520][SQL]Improvement a special case for non-det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18892 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80764/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18892: [SPARK-21520][SQL]Improvement a special case for non-det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18892 **[Test build #80764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80764/testReport)** for PR 18892 at commit [`72e0252`](https://github.com/apache/spark/commit/72e0252bb2d3a9c7d43ed8756d8d7ea34fb80ca5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18965: [SPARK-21749][DOC] Add comments for MessageEncoder to ex...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18965 I think the more important question is, is it a protocol that we are guaranteeing? not sure that is. If not then I don't think it should be in user-facing docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18648 @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133631784 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -68,4 +68,25 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-21739: Cast expression should initialize timezoneId " + --- End diff -- Oh, it should select the TimestampType column. Thanks for reminder, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 @cloud-fan . The PR is updated. Now, it's minimized as +493 and â247 lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80771/testReport)** for PR 18953 at commit [`80c80f3`](https://github.com/apache/spark/commit/80c80f34eb4dfb7c94d7875438effab52c71575d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18959 **[Test build #80770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80770/testReport)** for PR 18959 at commit [`973402b`](https://github.com/apache/spark/commit/973402bd822c05f8895405fbcaf918edbaad9d23). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18965: [SPARK-21749][DOC] Add comments for MessageEncoder to ex...
Github user neoremind commented on the issue: https://github.com/apache/spark/pull/18965 I see, anyway this is what I found when I dig into the wire protocol of spark rpc since wire format is a big part of understanding the message structure. If someone thinks this is not necessary I can close the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80763/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18959 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18959 **[Test build #80763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80763/testReport)** for PR 18959 at commit [`b33fde8`](https://github.com/apache/spark/commit/b33fde86ecd6a0be5f4a55c408ab10e0ac44101a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18959#discussion_r133629453 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala --- @@ -162,7 +162,13 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter { }.head assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch") - assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch") + + // Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method + // in `SparkPlanner` that internally uses `AttributeSet.toSeq`. + // Since we change an output order of `AttributeSet.toSeq` in SPARK-18394, + // we need to sort column names for a test below. --- End diff -- look good, I'll update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18968 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18968 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80765/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80765/testReport)** for PR 18968 at commit [`4604a08`](https://github.com/apache/spark/commit/4604a08e390019f7c3952774dd1b9086be9f2680). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq out...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18959#discussion_r133628398 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala --- @@ -162,7 +162,13 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter { }.head assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch") - assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch") + + // Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method + // in `SparkPlanner` that internally uses `AttributeSet.toSeq`. + // Since we change an output order of `AttributeSet.toSeq` in SPARK-18394, + // we need to sort column names for a test below. --- End diff -- How about? > Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method in `SparkPlanner`. This method internally uses `AttributeSet.toSeq`, in which the returned output columns are sorted by the names and expression ids. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...
Github user neoremind commented on the issue: https://github.com/apache/spark/pull/18964 Not yet since it is OK to keep buffer size as default system value, but to keep it consistent as user would like to specify, this makes sense. I also notice that Spark RPC by default uses java native serialization, even a verifying endpoint exist or not request would cost 1K of payload size, not to mention some other real logic endpoint, so in the real world it might be useful to profile this, I suggest maybe providing more RPC monitoring log to or hook would be beneficial, anyway this should be discussed in another thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarchy to ma...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18958 also cc @kiszk for another column vector pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133626997 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -505,18 +500,12 @@ public void filterNullsInColumn(int ordinal) { nullFilteredColumns.add(ordinal); } - private ColumnarBatch(StructType schema, int maxRows, MemoryMode memMode) { + public ColumnarBatch(StructType schema, ColumnVector[] columns, int capacity) { this.schema = schema; -this.capacity = maxRows; -this.columns = new ColumnVector[schema.size()]; +this.columns = columns; +this.capacity = capacity; --- End diff -- I found some places referring `ColumnarBatch.capacity()`, so I'd be a little conservative to do that for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18955: [SPARK-21743][SQL] top-most limit should not caus...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18955 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18967: [SQL] [MINOR] [TEST] Set spark.unsafe.exceptionOnMemoryL...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18967 +1 for updating pom and SparkBuild.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18955: [SPARK-21743][SQL] top-most limit should not cause memor...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18955 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80769/testReport)** for PR 18968 at commit [`f5d8ebb`](https://github.com/apache/spark/commit/f5d8ebb20ef73c115118b062c57f0f1372f672a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18936: [SPARK-21688][ML][MLLIB] make native BLAS the first choi...
Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/18936 thanks, Sean and Nick. To @srowen , I think the difference is the finding from our previous investigation that, thread setting in the native BLAS impacts the overall performance of a method/algorithm. To @MLnick, Agree. We know it demands a certain amount of benchmark work for this PR, since the changes are in a low level of the stack, they will impact several methods and there are also other native BLAS implementations, not just MKL, So, we take SVM as an example to show what we might get from it. Also, given the fact that mllib is only in maintenance mode, pls let us know if such change is unworthy of the work required. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18887 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80757/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18887 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18956 The PR going to fix the issue in `PullupCorrelatedPredicates` is submitted at #18968. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18887 **[Test build #80757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80757/testReport)** for PR 18887 at commit [`dc642bd`](https://github.com/apache/spark/commit/dc642bd70042da965387916656747ae78acdc192). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80768/testReport)** for PR 18930 at commit [`5191ed4`](https://github.com/apache/spark/commit/5191ed48a57017b3eeb3336e7ffa4a823dca5c28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80767/testReport)** for PR 18968 at commit [`4a47393`](https://github.com/apache/spark/commit/4a47393e4605790c4cdf1a33639cd6595cd35ba8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18930 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarchy to ma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18958 **[Test build #80766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80766/testReport)** for PR 18958 at commit [`b6ab633`](https://github.com/apache/spark/commit/b6ab63359e00d7fe0175204f191ff1baa10b789f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80758/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80758/testReport)** for PR 18930 at commit [`5191ed4`](https://github.com/apache/spark/commit/5191ed48a57017b3eeb3336e7ffa4a823dca5c28). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133625043 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -68,4 +68,25 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-21739: Cast expression should initialize timezoneId " + +"when it is called statically to convert something into TimestampType") { +// create table for test +sql("CREATE TABLE table_with_timestamp_partition(value int) PARTITIONED by (ts timestamp)") +sql("INSERT OVERWRITE TABLE table_with_timestamp_partition " + + "partition (ts = '2010-01-01 00:00:00.000') VALUES (1)") +sql("INSERT OVERWRITE TABLE table_with_timestamp_partition " + + "partition (ts = '2010-01-02 00:00:00.000') VALUES (2)") + +// test for Cast expression in TableReader +checkAnswer(sql("select value from table_with_timestamp_partition"), + Seq(Row(1), Row(2))) + +// test for Cast expression in HiveTableScanExec +checkAnswer(sql("select value from table_with_timestamp_partition " + + "where ts = '2010-01-02 00:00:00.000'"), Row(2)) + +sql("DROP TABLE IF EXISTS table_with_timestamp_partition") --- End diff -- use `WithTable`. You can check how we do it in the other test cases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133625007 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -68,4 +68,25 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-21739: Cast expression should initialize timezoneId " + --- End diff -- This test can pass without the change in `TableReader.scala`. We need another test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80759/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #80759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80759/testReport)** for PR 15435 at commit [`a041ea2`](https://github.com/apache/spark/commit/a041ea22d403b0befb6cade619ebfa5251658aba). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18960 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18960 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80760/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18960 **[Test build #80760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80760/testReport)** for PR 18960 at commit [`a264e3a`](https://github.com/apache/spark/commit/a264e3aa166d2e83832a82489669893f41ff9749). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133622586 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -89,14 +91,23 @@ class VectorizedHashMapGenerator( |$generatedAggBufferSchema | | public $generatedClassName() { - |batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema, - | org.apache.spark.memory.MemoryMode.ON_HEAP, capacity); - |// TODO: Possibly generate this projection in HashAggregate directly - |aggregateBufferBatch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate( - | aggregateBufferSchema, org.apache.spark.memory.MemoryMode.ON_HEAP, capacity); - |for (int i = 0 ; i < aggregateBufferBatch.numCols(); i++) { - | aggregateBufferBatch.setColumn(i, batch.column(i+${groupingKeys.length})); + |batchVectors = new org.apache.spark.sql.execution.vectorized + | .OnHeapColumnVector[schema.fields().length]; + |for (int i = 0; i < schema.fields().length; i++) { + | batchVectors[i] = new org.apache.spark.sql.execution.vectorized.OnHeapColumnVector( + |capacity, schema.fields()[i].dataType()); + |} + |batch = new org.apache.spark.sql.execution.vectorized.ColumnarBatch( + | schema, batchVectors, capacity); + | + |bufferVectors = new org.apache.spark.sql.execution.vectorized + | .OnHeapColumnVector[aggregateBufferSchema.fields().length]; + |for (int i = 0; i < aggregateBufferSchema.fields().length; i++) { + | bufferVectors[i] = batchVectors[i + ${groupingKeys.length}]; |} + |// TODO: Possibly generate this projection in HashAggregate directly --- End diff -- I'm sorry but I'm not sure because this is from original code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133622350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -89,14 +91,23 @@ class VectorizedHashMapGenerator( |$generatedAggBufferSchema | | public $generatedClassName() { - |batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema, - | org.apache.spark.memory.MemoryMode.ON_HEAP, capacity); - |// TODO: Possibly generate this projection in HashAggregate directly - |aggregateBufferBatch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate( - | aggregateBufferSchema, org.apache.spark.memory.MemoryMode.ON_HEAP, capacity); - |for (int i = 0 ; i < aggregateBufferBatch.numCols(); i++) { - | aggregateBufferBatch.setColumn(i, batch.column(i+${groupingKeys.length})); + |batchVectors = new org.apache.spark.sql.execution.vectorized --- End diff -- Sure, I'll try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18929 thanks @srowen @jerryshao @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUT...
Github user heary-cao closed the pull request at: https://github.com/apache/spark/pull/18929 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80762/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17373 **[Test build #80762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80762/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17373 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80765/testReport)** for PR 18968 at commit [`4604a08`](https://github.com/apache/spark/commit/4604a08e390019f7c3952774dd1b9086be9f2680). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates sho...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18968 [SPARK-21759][SQL] PullupCorrelatedPredicates should not produce unresolved plans ## What changes were proposed in this pull request? With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule `PullupCorrelatedPredicates` can produce unresolved plans. For a correlated IN query like: Project [a#0] +- Filter a#0 IN (list#4 [b#1]) : +- Project [c#2] : +- Filter (outer(b#1) < d#3) :+- LocalRelation , [c#2, d#3] +- LocalRelation , [a#0, b#1] After `PullupCorrelatedPredicates`, it produces query plan like: 'Project [a#0] +- 'Filter a#0 IN (list#4 [(b#1 < d#3)]) : +- Project [c#2, d#3] : +- LocalRelation , [c#2, d#3] +- LocalRelation , [a#0, b#1] Because the correlated predicate involves another attribute `d#3` in subquery, it has been pulled out and added into the `Project` on the top of the subquery. When `list` in `In` contains just one `ListQuery`, `In.checkInputDataTypes` checks if the size of `value` expressions matches the output size of subquery. In the above example, there is only `value` expression and the subquery output has two attributes `c#2, d#3`, so it fails the check and `In.resolved` returns `false`. We should not let `PullupCorrelatedPredicates` produce unresolved plans to fail the structural integrity check. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21759 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18968.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18968 commit 4604a08e390019f7c3952774dd1b9086be9f2680 Author: Liang-Chi Hsieh Date: 2017-08-17T04:16:39Z PullupCorrelatedPredicates should not produce unresolved plans. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18892: [SPARK-21520][SQL]Improvement a special case for non-det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18892 **[Test build #80764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80764/testReport)** for PR 18892 at commit [`72e0252`](https://github.com/apache/spark/commit/72e0252bb2d3a9c7d43ed8756d8d7ea34fb80ca5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133618935 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -40,8 +39,43 @@ private long lengthData; private long offsetData; - protected OffHeapColumnVector(int capacity, DataType type) { -super(capacity, type, MemoryMode.OFF_HEAP); + public OffHeapColumnVector(int capacity, DataType type) { +super(capacity, type); + +if (type instanceof ArrayType || type instanceof BinaryType || type instanceof StringType --- End diff -- Sure, I'll try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133618947 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -491,6 +525,22 @@ public void loadBytes(ColumnVector.Array array) { array.byteArrayOffset = 0; } + /** + * Reserve a integer column for ids of dictionary. + */ + @Override + public OffHeapColumnVector reserveDictionaryIds(int capacity) { --- End diff -- Sure, I'll try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133618679 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -227,7 +228,8 @@ class HadoopTableReader( def fillPartitionKeys(rawPartValues: Array[String], row: InternalRow): Unit = { partitionKeyAttrs.foreach { case (attr, ordinal) => val partOrdinal = partitionKeys.indexOf(attr) - row(ordinal) = Cast(Literal(rawPartValues(partOrdinal)), attr.dataType).eval(null) + row(ordinal) = Cast(Literal(rawPartValues(partOrdinal)), attr.dataType, +Option(SQLConf.get.sessionLocalTimeZone)).eval(null) --- End diff -- Do you mean a test case for HadoopTableReader? a little confusing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @cloud-fan . As you adviced, I will replace old ORC in the current namespace and will try to move to `sql/core` later. Although, we cannot switch among old ORC and new ORC, we can bring back old ORC if need from the code. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18955: [SPARK-21743][SQL] top-most limit should not cause memor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18955 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133617360 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -307,64 +293,73 @@ public void update(int ordinal, Object value) { @Override public void setNullAt(int ordinal) { + assert (columns[ordinal] instanceof MutableColumnVector); assert (!columns[ordinal].isConstant); - columns[ordinal].putNull(rowId); + ((MutableColumnVector) columns[ordinal]).putNull(rowId); } @Override public void setBoolean(int ordinal, boolean value) { + assert (columns[ordinal] instanceof MutableColumnVector); assert (!columns[ordinal].isConstant); - columns[ordinal].putNotNull(rowId); - columns[ordinal].putBoolean(rowId, value); + ((MutableColumnVector) columns[ordinal]).putNotNull(rowId); --- End diff -- Sure, I'll add a private getter and update these. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18955: [SPARK-21743][SQL] top-most limit should not cause memor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18955 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r133617361 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnVector.java --- @@ -0,0 +1,599 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.vectorized; + +import java.math.BigDecimal; +import java.math.BigInteger; + +import com.google.common.annotations.VisibleForTesting; + +import org.apache.spark.sql.internal.SQLConf; +import org.apache.spark.sql.types.*; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * This class adds write APIs to ColumnVector. + * It supports all the types and contains put APIs as well as their batched versions. + * The batched versions are preferable whenever possible. + * + * Capacity: The data stored is dense but the arrays are not fixed capacity. It is the + * responsibility of the caller to call reserve() to ensure there is enough room before adding + * elements. This means that the put() APIs do not check as in common cases (i.e. flat schemas), + * the lengths are known up front. + * + * A ColumnVector should be considered immutable once originally created. In other words, it is not + * valid to call put APIs after reads until reset() is called. + */ +public abstract class MutableColumnVector extends ColumnVector { + + /** + * Resets this column for writing. The currently stored values are no longer accessible. + */ + @Override + public void reset() { +if (isConstant) return; + +if (childColumns != null) { + for (ColumnVector c: childColumns) { +c.reset(); + } +} +numNulls = 0; +elementsAppended = 0; +if (anyNullsSet) { + putNotNulls(0, capacity); + anyNullsSet = false; +} + } + + public void reserve(int requiredCapacity) { +if (requiredCapacity > capacity) { + int newCapacity = (int) Math.min(MAX_CAPACITY, requiredCapacity * 2L); + if (requiredCapacity <= newCapacity) { +try { + reserveInternal(newCapacity); +} catch (OutOfMemoryError outOfMemoryError) { + throwUnsupportedException(requiredCapacity, outOfMemoryError); +} + } else { +throwUnsupportedException(requiredCapacity, null); + } +} + } + + private void throwUnsupportedException(int requiredCapacity, Throwable cause) { +String message = "Cannot reserve additional contiguous bytes in the vectorized reader " + +"(requested = " + requiredCapacity + " bytes). As a workaround, you can disable the " + +"vectorized reader by setting " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + +" to false."; + +if (cause != null) { + throw new RuntimeException(message, cause); --- End diff -- Thanks. I'll update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18955: [SPARK-21743][SQL] top-most limit should not cause memor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18955 **[Test build #80756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80756/testReport)** for PR 18955 at commit [`4462778`](https://github.com/apache/spark/commit/44627788b9af15e84ec951543a56c7c9970ef247). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18959 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18959 **[Test build #80763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80763/testReport)** for PR 18959 at commit [`b33fde8`](https://github.com/apache/spark/commit/b33fde86ecd6a0be5f4a55c408ab10e0ac44101a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18949: [SPARK-12961][CORE][FOLLOW-UP] Remove wrapper code for S...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18949 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18959 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17373 **[Test build #80762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80762/testReport)** for PR 17373 at commit [`5369b08`](https://github.com/apache/spark/commit/5369b088e7fcb0fa35b0e4c840772cf60515c882). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18959: [SPARK-18394][SQL] Make an AttributeSet.toSeq output ord...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80761/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18960 **[Test build #80760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80760/testReport)** for PR 18960 at commit [`a264e3a`](https://github.com/apache/spark/commit/a264e3aa166d2e83832a82489669893f41ff9749). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133614757 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -104,7 +105,7 @@ case class HiveTableScanExec( hadoopConf) private def castFromString(value: String, dataType: DataType) = { -Cast(Literal(value), dataType).eval(null) +Cast(Literal(value), dataType, Option(SQLConf.get.sessionLocalTimeZone)).eval(null) --- End diff -- Here, we can obtain SQLConf directly with `sparkSession.sessionState.conf` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #80759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80759/testReport)** for PR 15435 at commit [`a041ea2`](https://github.com/apache/spark/commit/a041ea22d403b0befb6cade619ebfa5251658aba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user jmchung commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r133614456 --- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql --- @@ -20,3 +20,9 @@ select from_json('{"a":1}', 'a InvalidType'); select from_json('{"a":1}', 'a INT', named_struct('mode', 'PERMISSIVE')); select from_json('{"a":1}', 'a INT', map('mode', 1)); select from_json(); +-- json_tuple +describe function json_tuple; +describe function extended json_tuple; +select json_tuple('{"a" : 1, "b" : 2}', cast(NULL AS STRING), 'b', cast(NULL AS STRING), 'a') +create temporary view jsonTable(jsonField, a, b) as select * from values '{"a": 1, "b": 2}', 'a', 'b'; +SELECT json_tuple(jsonField, b, cast(NULL AS STRING), 'a') FROM jsonTable --- End diff -- @gatorsmile @viirya Thank you for your time to review the code. SQL statements are consistent in style and the golden file of `json-functions.sql` also committed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80758/testReport)** for PR 18930 at commit [`5191ed4`](https://github.com/apache/spark/commit/5191ed48a57017b3eeb3336e7ffa4a823dca5c28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133612759 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -104,7 +105,7 @@ case class HiveTableScanExec( hadoopConf) private def castFromString(value: String, dataType: DataType) = { -Cast(Literal(value), dataType).eval(null) +Cast(Literal(value), dataType, Option(SQLConf.get.sessionLocalTimeZone)).eval(null) --- End diff -- BTW, is it elegant to initialize a `CastSupport` (`DataSourceAnalysis` rule or `DataSourceStrategy`) here, in which we still need to pass `SQLConf`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18902 I test on dataframes containing `null`, both `avg` and `stat.approxQuantile` will ignore `null`. And if one column only contain `null`, `null` and `Array.empty[Double]` will be returned respectively. Agree that we add more tests for this dependency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18887 **[Test build #80757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80757/testReport)** for PR 18887 at commit [`dc642bd`](https://github.com/apache/spark/commit/dc642bd70042da965387916656747ae78acdc192). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18887 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r133610560 --- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql --- @@ -20,3 +20,9 @@ select from_json('{"a":1}', 'a InvalidType'); select from_json('{"a":1}', 'a INT', named_struct('mode', 'PERMISSIVE')); select from_json('{"a":1}', 'a INT', map('mode', 1)); select from_json(); +-- json_tuple +describe function json_tuple; +describe function extended json_tuple; +select json_tuple('{"a" : 1, "b" : 2}', cast(NULL AS STRING), 'b', cast(NULL AS STRING), 'a') +create temporary view jsonTable(jsonField, a, b) as select * from values '{"a": 1, "b": 2}', 'a', 'b'; +SELECT json_tuple(jsonField, b, cast(NULL AS STRING), 'a') FROM jsonTable --- End diff -- Remember to drop the created view `DROP VIEW IF EXISTS jsonTable;` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading r...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18962#discussion_r133610467 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -330,18 +330,45 @@ object SparkSubmit extends CommandLineUtils { args.archives = Option(args.archives).map(resolveGlobPaths(_, hadoopConf)).orNull // In client mode, download remote files. +var localPrimaryResource: String = null +var localJars: String = null +var localPyFiles: String = null if (deployMode == CLIENT) { - args.primaryResource = Option(args.primaryResource).map { + localPrimaryResource = Option(args.primaryResource).map { downloadFile(_, targetDir, args.sparkProperties, hadoopConf) }.orNull - args.jars = Option(args.jars).map { + localJars = Option(args.jars).map { downloadFileList(_, targetDir, args.sparkProperties, hadoopConf) }.orNull - args.pyFiles = Option(args.pyFiles).map { + localPyFiles = Option(args.pyFiles).map { downloadFileList(_, targetDir, args.sparkProperties, hadoopConf) }.orNull } +if (clusterManager == YARN) { + def isNoneFsFileExist(paths: String): Boolean = { +Option(paths).exists { p => + p.split(",").map(_.trim).filter(_.nonEmpty).exists { path => +val url = Utils.resolveURI(path) +url.getScheme match { + case "http" | "https" | "ftp" => true + case _ => false +} + } +} + } + + // Spark on YARN doesn't support upload remote resources from http, https or ftp server + // directly to distributed cache, so print a warning and exit the process. + if (isNoneFsFileExist(args.jars) || --- End diff -- That kinda looks like a bug. Spark shouldn't be trying to upload files that the distributed cache can handle itself; not sure if there's a programmatic way of figuring out the list of schemes that it supports, though. At worst, Spark shouldn't do anything for those URLs; executors should be able to download directly from http / https if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18948: Add the validation of spark.cores.max under Strea...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/18948#discussion_r133610439 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -144,6 +144,13 @@ class StreamingContext private[streaming] ( } } + if (sc.conf.contains("spark.cores.max")) { +val totalCores = sc.conf.getInt("spark.cores.max", 1) --- End diff -- @jiangxb1987 "spark.cores.max" is per application configuration to limit the numbers of cores can be requested for this application, it is not a per executor limitation. > The config spark.cores.max is used to limit the max number of cores that a single executor can require So still if we have 2 receivers in one streaming application, the minimum number should > 2, checking "1" here is still not feasible. Since receiver number can only be gotten in run-time, checking configuration will not be worked as expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18967: [SQL] [MINOR] [TEST] Set spark.unsafe.exceptionOnMemoryL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18967: [SQL] [MINOR] [TEST] Set spark.unsafe.exceptionOnMemoryL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18967: [SQL] [MINOR] [TEST] Set spark.unsafe.exceptionOnMemoryL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18967 **[Test build #80753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80753/testReport)** for PR 18967 at commit [`c6cca96`](https://github.com/apache/spark/commit/c6cca96eb2c05fa69b0725cae5002323a5d12589). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18734#discussion_r133609212 --- Diff: python/pyspark/cloudpickle.py --- @@ -241,11 +338,32 @@ def save_function(self, obj, name=None): if getattr(themodule, name, None) is obj: return self.save_global(obj, name) +# a builtin_function_or_method which comes in as an attribute of some +# object (e.g., object.__new__, itertools.chain.from_iterable) will end +# up with modname "__main__" and so end up here. But these functions +# have no __code__ attribute in CPython, so the handling for +# user-defined functions below will fail. +# So we pickle them here using save_reduce; have to do it differently +# for different python versions. +if not hasattr(obj, '__code__'): +if PY3: +if sys.version_info < (3, 4): +raise pickle.PicklingError("Can't pickle %r" % obj) +else: +rv = obj.__reduce_ex__(self.proto) +else: +if hasattr(obj, '__self__'): +rv = (getattr, (obj.__self__, name)) +else: +raise pickle.PicklingError("Can't pickle %r" % obj) +return Pickler.save_reduce(self, obj=obj, *rv) + # if func is lambda, def'ed at prompt, is in main, or is nested, then # we'll pickle the actual function object rather than simply saving a # reference (as is done in default pickler), via save_function_tuple. -if islambda(obj) or obj.__code__.co_filename == '' or themodule is None: -#print("save global", islambda(obj), obj.__code__.co_filename, modname, themodule) +if (islambda(obj) --- End diff -- Just as a side note, it looks this PR includes https://github.com/cloudpipe/cloudpickle/pull/51 too (SPARK-21753). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18962: [SPARK-21714][CORE][YARN] Avoiding re-uploading r...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/18962#discussion_r133609050 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -330,18 +330,45 @@ object SparkSubmit extends CommandLineUtils { args.archives = Option(args.archives).map(resolveGlobPaths(_, hadoopConf)).orNull // In client mode, download remote files. +var localPrimaryResource: String = null +var localJars: String = null +var localPyFiles: String = null if (deployMode == CLIENT) { - args.primaryResource = Option(args.primaryResource).map { + localPrimaryResource = Option(args.primaryResource).map { downloadFile(_, targetDir, args.sparkProperties, hadoopConf) }.orNull - args.jars = Option(args.jars).map { + localJars = Option(args.jars).map { downloadFileList(_, targetDir, args.sparkProperties, hadoopConf) }.orNull - args.pyFiles = Option(args.pyFiles).map { + localPyFiles = Option(args.pyFiles).map { downloadFileList(_, targetDir, args.sparkProperties, hadoopConf) }.orNull } +if (clusterManager == YARN) { + def isNoneFsFileExist(paths: String): Boolean = { +Option(paths).exists { p => + p.split(",").map(_.trim).filter(_.nonEmpty).exists { path => +val url = Utils.resolveURI(path) +url.getScheme match { + case "http" | "https" | "ftp" => true + case _ => false +} + } +} + } + + // Spark on YARN doesn't support upload remote resources from http, https or ftp server + // directly to distributed cache, so print a warning and exit the process. + if (isNoneFsFileExist(args.jars) || --- End diff -- The code [here](https://github.com/apache/spark/blob/b8ffb51055108fd606b86f034747006962cd2df3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L346) compare two FSs and copy src to dest if FS is different. AFAIK there's no http scheme in Hadoop, so `val srcFs = srcPath.getFileSystem(hadoopConf)` this probably will throw exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18955: [SPARK-21743][SQL] top-most limit should not cause memor...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18955 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18935 Thanks @zsxwing . I was thinking to expose the details of memory allocation in Netty for user to monitor and tune, user could filter out unrelated metrics. Maybe you're right, it is too verbose and too detailed to expose each arena's detail, let me change the codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18960: [SPARK-21739][SQL]Cast expression should initiali...
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/18960#discussion_r133607798 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -227,7 +228,8 @@ class HadoopTableReader( def fillPartitionKeys(rawPartValues: Array[String], row: InternalRow): Unit = { partitionKeyAttrs.foreach { case (attr, ordinal) => val partOrdinal = partitionKeys.indexOf(attr) - row(ordinal) = Cast(Literal(rawPartValues(partOrdinal)), attr.dataType).eval(null) + row(ordinal) = Cast(Literal(rawPartValues(partOrdinal)), attr.dataType, +Option(SQLConf.get.sessionLocalTimeZone)).eval(null) --- End diff -- OK, I will work on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18964 The change looks OK to me. Did you meet the issue in which you have to change the buffer size in the client side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18965: [SPARK-21749][DOC] Add comments for MessageEncoder to ex...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18965 We usually don't have the PRs which only add comments to explain something, so I'm neutral to this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18957: [SPARK-21744][CORE] Add retry logic for new broadcast in...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/18957 If we implement retry logic in `DiskBlockManager`, any high-level which call `DiskBlockManager` will be influenced, so i only implement in 'BroadcastManager'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18960 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18957: [SPARK-21744][CORE] Add retry logic for new broadcast in...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/18957 @jiangxb1987 From unit test,we can see if "spark.local.dir" has good disk and bad disk, retry given times can skip bad disk and driver will not exit with exception.And this change has no side-effect for normal scenario. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18960 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80754/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18960: [SPARK-21739][SQL]Cast expression should initialize time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18960 **[Test build #80754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80754/testReport)** for PR 18960 at commit [`492b756`](https://github.com/apache/spark/commit/492b756fde5008854d1351ed423c3897c683c662). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18855 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18963: [SPARK-18464][SQL][backport] support old table wh...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/18963 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes fails fo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18855 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org