[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649991405 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124525/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649991400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649991400 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649912241 **[Test build #124525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)** for PR 28898 at commit [`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649990865 **[Test build #124525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)** for PR 28898 at commit [`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow
HyukjinKwon commented on a change in pull request #28928: URL: https://github.com/apache/spark/pull/28928#discussion_r445977049 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -413,7 +413,7 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone): # Slice the DataFrame to be batched step = -(-len(pdf) // self.sparkContext.defaultParallelism) # round int up -pdf_slices = (pdf[start:start + step] for start in xrange(0, len(pdf), step)) +pdf_slices = (pdf.iloc[start:start + step] for start in xrange(0, len(pdf), step)) Review comment: As far as I can tell, yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow
gatorsmile commented on a change in pull request #28928: URL: https://github.com/apache/spark/pull/28928#discussion_r445976618 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -413,7 +413,7 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone): # Slice the DataFrame to be batched step = -(-len(pdf) // self.sparkContext.defaultParallelism) # round int up -pdf_slices = (pdf[start:start + step] for start in xrange(0, len(pdf), step)) +pdf_slices = (pdf.iloc[start:start + step] for start in xrange(0, len(pdf), step)) Review comment: Thank you for fixing this! > While standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods, .at, .iat, .loc and .iloc. Is it the only place? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649976751 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124527/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649976746 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649976746 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649936846 **[Test build #124527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)** for PR 28898 at commit [`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649976637 **[Test build #124527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)** for PR 28898 at commit [`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649968228 In the latest update, there are three changes: 1. `failedEpoch` and `fileLostEpoch` are renamed and comments explaining what they are are expanded, largely based on suggestions from @squito. 2. A call to `clearCacheLocs` is moved into the correct if block in `removeExecutorAndUnregisterOutputs`. 3. In `DAGSchedulerSuite`, `mapOutputTracker` and `blockManagerMaster` are wrapped by `Mockito.spy` and the spies are used to verify how many times each is called. This verification is added to some existing tests, which pass without my change to `DAGScheduler`. The verification is also added to the new test case for this bug. Thanks to @attilapiros for his illustrative example using `Mockito.spy`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649963283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
AmplabJenkins commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649963283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF
AmplabJenkins removed a comment on pull request #28805: URL: https://github.com/apache/spark/pull/28805#issuecomment-649963207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF
AmplabJenkins commented on pull request #28805: URL: https://github.com/apache/spark/pull/28805#issuecomment-649963207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on a change in pull request #28848: URL: https://github.com/apache/spark/pull/28848#discussion_r445965785 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -177,6 +177,8 @@ private[spark] class DAGScheduler( // TODO: Garbage collect information about failure epochs when we know there are no more // stray messages to detect. private val failedEpoch = new HashMap[String, Long] + // In addition, track epoch for failed executors that result in lost file output Review comment: I changed `fileLostEpoch` to `shuffleFileLostEpoch` and more or less adopted your suggestion for the comment explaining it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649962628 **[Test build #124530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124530/testReport)** for PR 28848 at commit [`d09ef93`](https://github.com/apache/spark/commit/d09ef9335e5d3657b830497155abb7a0c2bb0cde). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF
SparkQA commented on pull request #28805: URL: https://github.com/apache/spark/pull/28805#issuecomment-649962618 **[Test build #124531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124531/testReport)** for PR 28805 at commit [`270324e`](https://github.com/apache/spark/commit/270324ee306f035352b58e77718d73810f1ffa1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on a change in pull request #28848: URL: https://github.com/apache/spark/pull/28848#discussion_r445965319 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -177,6 +177,8 @@ private[spark] class DAGScheduler( // TODO: Garbage collect information about failure epochs when we know there are no more // stray messages to detect. private val failedEpoch = new HashMap[String, Long] Review comment: I changed `failedEpoch` to `executorFailureEpoch` and more or less adopted your suggestion for the comment explaining it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649954884 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649954884 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649954378 **[Test build #124529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124529/testReport)** for PR 28898 at commit [`652c77f`](https://github.com/apache/spark/commit/652c77fdbbfa468271e783e1492f72f4785c9880). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
AmplabJenkins removed a comment on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-649944669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
AmplabJenkins commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-649944669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
SparkQA commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-649944172 **[Test build #124528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124528/testReport)** for PR 28676 at commit [`488e051`](https://github.com/apache/spark/commit/488e051e1a7c21c57b646d9f68df8c48e4717126). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649942950 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124524/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649942879 **[Test build #124524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)** for PR 28898 at commit [`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649942946 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649942946 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649907459 **[Test build #124524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)** for PR 28898 at commit [`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649937189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649937189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649936846 **[Test build #124527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)** for PR 28898 at commit [`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
frankyin-factual commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445946556 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala ## @@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest { comparePlans(optimized3, expected3) } + test("Nested field pruning for window functions") { +val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, UnspecifiedFrame) +val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec) +val query1 = contact.select($"name.first", winExpr.as('window)) + .where($"window" === 1 && $"name.first" === "a") + .analyze +val optimized1 = Optimize.execute(query1) +val aliases1 = collectGeneratedAliases(optimized1) +val expected1 = contact + .select($"name.first", $"address", $"id", $"name.first".as(aliases1(1))) + .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc)) + .select($"first", $"${aliases1(1)}".as(aliases1(0)), $"window") + .where($"window" === 1 && $"${aliases1(0)}" === "a") + .select($"first", $"window") + .analyze +comparePlans(optimized1, expected1) + } + + test("Nested field pruning for orderBy") { +val query1 = contact.select($"name.first", $"name.last") + .orderBy($"name.first".asc, $"name.last".asc) + .analyze +val optimized1 = Optimize.execute(query1) +val aliases1 = collectGeneratedAliases(optimized1) +val expected1 = contact + .select($"name.first", +$"name.last", +$"name.first".as(aliases1(0)), +$"name.last".as(aliases1(1))) + .orderBy($"${aliases1(0)}".asc, $"${aliases1(1)}".asc) + .select($"first", $"last") + .analyze +comparePlans(optimized1, expected1) + } + + test("Nested field pruning for sirtBy") { Review comment: Yeah This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
AmplabJenkins removed a comment on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649930413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
SparkQA removed a comment on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649924942 **[Test build #124526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)** for PR 28927 at commit [`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
AmplabJenkins commented on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649930413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
SparkQA commented on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649930269 **[Test build #124526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)** for PR 28927 at commit [`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
AmplabJenkins removed a comment on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649925471 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
AmplabJenkins commented on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649925471 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
SparkQA commented on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649924942 **[Test build #124526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)** for PR 28927 at commit [`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
HyukjinKwon commented on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649923516 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation
AmplabJenkins removed a comment on pull request #28927: URL: https://github.com/apache/spark/pull/28927#issuecomment-649467834 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
viirya commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445935640 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala ## @@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest { comparePlans(optimized3, expected3) } + test("Nested field pruning for window functions") { +val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, UnspecifiedFrame) +val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec) +val query1 = contact.select($"name.first", winExpr.as('window)) + .where($"window" === 1 && $"name.first" === "a") + .analyze +val optimized1 = Optimize.execute(query1) +val aliases1 = collectGeneratedAliases(optimized1) +val expected1 = contact Review comment: If there is only one query, we don't need to name it as `query1`, `optimized1`... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
viirya commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445935343 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala ## @@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest { comparePlans(optimized3, expected3) } + test("Nested field pruning for window functions") { +val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, UnspecifiedFrame) +val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec) +val query1 = contact.select($"name.first", winExpr.as('window)) + .where($"window" === 1 && $"name.first" === "a") + .analyze +val optimized1 = Optimize.execute(query1) +val aliases1 = collectGeneratedAliases(optimized1) +val expected1 = contact + .select($"name.first", $"address", $"id", $"name.first".as(aliases1(1))) + .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc)) + .select($"first", $"${aliases1(1)}".as(aliases1(0)), $"window") + .where($"window" === 1 && $"${aliases1(0)}" === "a") + .select($"first", $"window") + .analyze +comparePlans(optimized1, expected1) + } + + test("Nested field pruning for orderBy") { +val query1 = contact.select($"name.first", $"name.last") + .orderBy($"name.first".asc, $"name.last".asc) + .analyze +val optimized1 = Optimize.execute(query1) +val aliases1 = collectGeneratedAliases(optimized1) +val expected1 = contact + .select($"name.first", +$"name.last", +$"name.first".as(aliases1(0)), +$"name.last".as(aliases1(1))) + .orderBy($"${aliases1(0)}".asc, $"${aliases1(1)}".asc) + .select($"first", $"last") + .analyze +comparePlans(optimized1, expected1) + } + + test("Nested field pruning for sirtBy") { Review comment: Do you mean sortBy? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
viirya commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445935219 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -39,6 +39,14 @@ object NestedColumnAliasing { NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, attrToAliases) } +case Project(projectList, Filter(condition, child)) Review comment: I think we better leave a few comment explaining this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649912241 **[Test build #124525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)** for PR 28898 at commit [`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649909806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649909806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649907836 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649907836 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649907459 **[Test build #124524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)** for PR 28898 at commit [`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
AmplabJenkins removed a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
AmplabJenkins commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-649906157 @maropu Resolved the conflicts. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
SparkQA commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649906173 **[Test build #124523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
SparkQA removed a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649865700 **[Test build #124523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28896: [SPARK-32025][SQL] Csv schema inference problems with different types in the same column
HyukjinKwon closed pull request #28896: URL: https://github.com/apache/spark/pull/28896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28896: [SPARK-32025][SQL] Csv schema inference problems with different types in the same column
HyukjinKwon commented on pull request #28896: URL: https://github.com/apache/spark/pull/28896#issuecomment-649901085 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
AmplabJenkins commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-649898619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
AmplabJenkins removed a comment on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-649898619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
SparkQA removed a comment on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-649795495 **[Test build #124520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124520/testReport)** for PR 28425 at commit [`7ce28a2`](https://github.com/apache/spark/commit/7ce28a2cd4345f7911d0ef4f681aa8421af22547). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
SparkQA commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-649898054 **[Test build #124520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124520/testReport)** for PR 28425 at commit [`7ce28a2`](https://github.com/apache/spark/commit/7ce28a2cd4345f7911d0ef4f681aa8421af22547). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Task lauchtime in stage tab not correct
TJX2014 edited a comment on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-649862045 > According to the following documents, this change seems work with recent browsers. > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse > https://tc39.es/ecma262/#sec-date-time-string-format Thanks, @sarutak I find this change also work with ES6 . [https://www.tutorialspoint.com/es6/es6_date.htm](https://www.tutorialspoint.com/es6/es6_date.htm) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
HyukjinKwon commented on a change in pull request #27331: URL: https://github.com/apache/spark/pull/27331#discussion_r445916122 ## File path: python/pyspark/sql/readwriter.py ## @@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None): self.mode(mode)._jwrite.jdbc(url, table, jprop) +class DataFrameWriterV2(object): +""" +Interface used to write a class:`pyspark.sql.dataframe.DataFrame` +to external storage using the v2 API. + +.. versionadded:: 3.1.0 +""" + +def __init__(self, df, table): +self._df = df +self._spark = df.sql_ctx +self._jwriter = df._jdf.writeTo(table) + +@since(3.1) +def using(self, provider): +""" +Specifies a provider for the underlying output data source. +Spark's default catalog supports "parquet", "json", etc. +""" +self._jwriter.using(provider) +return self + +@since(3.1) +def option(self, key, value): +""" +Add a write option. +""" +self._jwriter.option(key, to_str(value)) +return self + +@since(3.1) +def options(self, **options): +""" +Add write options. +""" +options = {k: to_str(v) for k, v in options.items()} +self._jwriter.options(options) +return self + +@since(3.1) +def partitionedBy(self, col, *cols): Review comment: @rdblue, I don't mean to we should do that here. I mean to suggest/discuss to make the separation in the Scala first because that propagates the confusion to PySpark API side as well. They are different things so I am suggesting to make it different. I hope we can more focus on the discussion itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
HyukjinKwon commented on a change in pull request #27331: URL: https://github.com/apache/spark/pull/27331#discussion_r445910093 ## File path: python/pyspark/sql/readwriter.py ## @@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None): self.mode(mode)._jwrite.jdbc(url, table, jprop) +class DataFrameWriterV2(object): +""" +Interface used to write a class:`pyspark.sql.dataframe.DataFrame` +to external storage using the v2 API. + +.. versionadded:: 3.1.0 +""" + +def __init__(self, df, table): +self._df = df +self._spark = df.sql_ctx +self._jwriter = df._jdf.writeTo(table) + +@since(3.1) +def using(self, provider): +""" +Specifies a provider for the underlying output data source. +Spark's default catalog supports "parquet", "json", etc. +""" +self._jwriter.using(provider) +return self + +@since(3.1) +def option(self, key, value): +""" +Add a write option. +""" +self._jwriter.option(key, to_str(value)) +return self + +@since(3.1) +def options(self, **options): +""" +Add write options. +""" +options = {k: to_str(v) for k, v in options.items()} +self._jwriter.options(options) +return self + +@since(3.1) +def partitionedBy(self, col, *cols): Review comment: @rdblue, I don't mean to we should do that here - this comment doesn't block this PR. I mean to suggest/discuss to make the separation in the Scala first because that propagates the confusion to PySpark API side as well. They are different things so I am suggesting to make it different. I hope we can more focus on the discussion itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF
AngersZh commented on a change in pull request #28805: URL: https://github.com/apache/spark/pull/28805#discussion_r445914791 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala ## @@ -108,4 +109,54 @@ class PruneFileSourcePartitionsSuite extends QueryTest with SQLTestUtils with Te } } } + + test("SPARK-28169: Convert scan predicate condition to CNF") { Review comment: > I'm thinking about adding a base test `PartitionPruningSuiteBase` with some common test cases. Then we can have a `FileSourcePartitionPruningSuite` with file-source specific tests, and `HiveTablePartitionPruningSuite` with hive-table specific tests. Current test in `FileSourcePartitionPruningSuite ` and `HiveTablePartitionPruningSuite ` seems don't have common test, can you show me some point to do these and I will work on this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
HyukjinKwon edited a comment on pull request #27331: URL: https://github.com/apache/spark/pull/27331#issuecomment-649888157 > I haven't replied because I don't see how it is an important concern. @rdblue, I explained multiple times why I think this is relevant and important - once you add them, you should fix it in Python and R side too. I don't believe all dev people are used to Python and R side given my interactions for many years in Spark dev. I support to add it for 3.1 but not now in the early stage if it's unstable. As I explained earlier, I take this DSv2 case as an exceptional case. See the concern about https://github.com/apache/spark/pull/27331#discussion_r445268946 too. This isn't a great way to discuss that you ignore because you don't think it's important or relevant. I just wanted to know the rough picture rather than asking you to assert the stability here because you are the one who drove DSv2 in the community, and I do believe you're the right one to ask. I fully understand the things can change. I am here to help and make progresses here rather than nitpicking or blaming on something not done. I fully understand the pain we had at DSv2. It would be nicer if we can be more cooperative next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
HyukjinKwon commented on pull request #27331: URL: https://github.com/apache/spark/pull/27331#issuecomment-649888157 > I haven't replied because I don't see how it is an important concern. @rdblue, I explained multiple times why I think this is relevant and important - once you add them, you should fix it in Python and R side too. I don't believe all dev people are used to Python and R side given my interactions for many years in Spark dev. I support to add it for 3.1 but not now in the early stage. As I explained earlier, I take this DSv2 case as an exceptional case. See the concern about https://github.com/apache/spark/pull/27331#discussion_r445268946 too. This isn't a great way to discuss that you ignore because you don't think it's important or relevant. I just wanted to know the rough picture rather than asking you to assert the stability here because you are the one who drove DSv2 in the community, and I do believe you're the right one to ask. I fully understand the things can change. I am here to help and make progresses here rather than nitpicking or blaming on something not done. I fully understand the pain we had at DSv2. It would be nicer if we can be more cooperative next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
HyukjinKwon commented on a change in pull request #27331: URL: https://github.com/apache/spark/pull/27331#discussion_r445910093 ## File path: python/pyspark/sql/readwriter.py ## @@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None): self.mode(mode)._jwrite.jdbc(url, table, jprop) +class DataFrameWriterV2(object): +""" +Interface used to write a class:`pyspark.sql.dataframe.DataFrame` +to external storage using the v2 API. + +.. versionadded:: 3.1.0 +""" + +def __init__(self, df, table): +self._df = df +self._spark = df.sql_ctx +self._jwriter = df._jdf.writeTo(table) + +@since(3.1) +def using(self, provider): +""" +Specifies a provider for the underlying output data source. +Spark's default catalog supports "parquet", "json", etc. +""" +self._jwriter.using(provider) +return self + +@since(3.1) +def option(self, key, value): +""" +Add a write option. +""" +self._jwriter.option(key, to_str(value)) +return self + +@since(3.1) +def options(self, **options): +""" +Add write options. +""" +options = {k: to_str(v) for k, v in options.items()} +self._jwriter.options(options) +return self + +@since(3.1) +def partitionedBy(self, col, *cols): Review comment: @rdblue, I don't mean to we should do that here. I mean to suggest to make the separation in the Scala first because that propagates the confusion to PySpark API side as well. They are different things so I am suggesting to make it different. I hope we can more focus on the discussion itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions
maropu commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649884335 This is not a bugfix, so we will merge this commit only into master(v3.1.0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions
frankyin-factual commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-649883695 Also, how likely this will get backported to 2.4.x versions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow
HyukjinKwon commented on pull request #28928: URL: https://github.com/apache/spark/pull/28928#issuecomment-649883425 Thank you @BryanCutler and @ueshin! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #26816: [SPARK-30191][YARN] optimize yarn allocator
github-actions[bot] closed pull request #26816: URL: https://github.com/apache/spark/pull/26816 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27377: [SPARK-30666][Core][WIP] Reliable single-stage accumulators
github-actions[bot] commented on pull request #27377: URL: https://github.com/apache/spark/pull/27377#issuecomment-649881487 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #25721: [WIP][SPARK-29018][SQL] Implement Spark Thrift Server with it's own code base on PROTOCOL_VERSION_V9
github-actions[bot] commented on pull request #25721: URL: https://github.com/apache/spark/pull/25721#issuecomment-649881504 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to PythonUDF.
github-actions[bot] closed pull request #18906: URL: https://github.com/apache/spark/pull/18906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #26711: [SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN
github-actions[bot] closed pull request #26711: URL: https://github.com/apache/spark/pull/26711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445904606 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala ## @@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSi for (pruningEnabled <- Seq(true, false)) { testCaching(pruningEnabled) } + + test("cache TTL") { +val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, "1") +val newSession = SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession() + +withSparkSession(newSession) { implicit spark => Review comment: Yea, `withSQLConf` is usd only for runtime configs, then we cannot use it for static configs. That's an issue of how-to-write-tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445903970 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala ## @@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSi for (pruningEnabled <- Seq(true, false)) { testCaching(pruningEnabled) } + + test("cache TTL") { +val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, "1") +val newSession = SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession() + +withSparkSession(newSession) { implicit spark => Review comment: Its okay to use `buildStaticConf` https://github.com/apache/spark/pull/28852#discussion_r445893610 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions
frankyin-factual commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445903857 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -32,7 +32,9 @@ object NestedColumnAliasing { def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match { case Project(projectList, child) -if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => +if SQLConf.get.nestedSchemaPruningEnabled && + (canProjectPushThrough(child) || +getChild(child).exists(canProjectPushThrough)) => Review comment: Yeah, I will update this PR later tonight. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28912: [SPARK-32057][SQL] ExecuteStatement: cancel and close should not transiently ERROR
maropu commented on pull request #28912: URL: https://github.com/apache/spark/pull/28912#issuecomment-649877864 @alismess-db Looks the valid test failures. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions
maropu commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445903069 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -32,7 +32,9 @@ object NestedColumnAliasing { def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match { case Project(projectList, child) -if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => +if SQLConf.get.nestedSchemaPruningEnabled && + (canProjectPushThrough(child) || +getChild(child).exists(canProjectPushThrough)) => Review comment: > How about use my proposal at #28898 (review)? If we cannot, yea, I think we need special handling for `Filter` as @viirya suggested above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions
maropu commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r445902694 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -32,7 +32,9 @@ object NestedColumnAliasing { def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match { case Project(projectList, child) -if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => +if SQLConf.get.nestedSchemaPruningEnabled && + (canProjectPushThrough(child) || +getChild(child).exists(canProjectPushThrough)) => Review comment: > That won’t work because it seems causing an infinite loop in optimizer. It gives me error messages like running out of max iterations. >> I see, it is due to predicate pushdown rule. I don't look into it though, we cannot fix the infinite loop caused by the predicate pushdown rule? If we can put `Filter` in `canProjectPushThrough`, it looks the best. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28864: [SPARK-32004][ALL] Drop references to slave
holdenk commented on pull request #28864: URL: https://github.com/apache/spark/pull/28864#issuecomment-649877003 If there are no more comments by EOW I'll merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649874411 > @wypoon if you have not started extending the test with the multiple fetch failures case you can use this I you agree with it: > [attilapiros@be14a51](https://github.com/attilapiros/spark/commit/be14a51ca766711d793d9a7314a2cf030e2acdc7) @attilapiros thanks for the code; that is very helpful. I had an offline chat with @squito, and he had a different test in mind, but in a similar spirit. He was thinking of a test to verify that in `DAGScheduler`, `blockManagerMaster.removeExecutor` is not called more than once after the executor is lost. I can use your approach (using Mockito spy) there as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sap1ens commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
sap1ens commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445898815 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala ## @@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSi for (pruningEnabled <- Seq(true, false)) { testCaching(pruningEnabled) } + + test("cache TTL") { +val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, "1") +val newSession = SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession() + +withSparkSession(newSession) { implicit spark => Review comment: @maropu hmm, how do I use `withSQLConf` with `StaticSQLConf `? It doesn't allow it: https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala#L50 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445885603 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala ## @@ -126,4 +131,40 @@ class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSi for (pruningEnabled <- Seq(true, false)) { testCaching(pruningEnabled) } + + test("expire cached metadata if TTL is configured") { +val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, "1") +val newSession = SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession() + +withSparkSession(newSession) { implicit spark => + withTable("test_ttl") { +withTempDir { dir => + spark.sql(s""" +|CREATE EXTERNAL TABLE test_ttl (id long) +|PARTITIONED BY (f1 int, f2 int) +|STORED AS PARQUET +|LOCATION "${dir.toURI}.stripMargin) Review comment: nit format: ``` spark.sql( s""" |CREATE EXTERNAL TABLE test_ttl (id long) |PARTITIONED BY (f1 int, f2 int) |STORED AS PARQUET |LOCATION "${dir.toURI}" """.stripMargin) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite
AmplabJenkins removed a comment on pull request #28929: URL: https://github.com/apache/spark/pull/28929#issuecomment-649870481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite
AmplabJenkins commented on pull request #28929: URL: https://github.com/apache/spark/pull/28929#issuecomment-649870481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445894796 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2656,6 +2656,16 @@ object SQLConf { .checkValue(_ > 0, "The difference must be positive.") .createWithDefault(4) + val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL") + .doc("Time-to-live (TTL) value for the metadata caches: partition file metadata cache and " + +"session catalog cache. This configuration only has an effect when this value having " + +"a positive value. It also requires setting `hive` to " + +s"${StaticSQLConf.CATALOG_IMPLEMENTATION} to be applied to the partition file " + Review comment: More conditions for this option to be enabled? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L43-L44 Since the user's documents are generated based on this description, I think it should be clear as much as possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite
SparkQA removed a comment on pull request #28929: URL: https://github.com/apache/spark/pull/28929#issuecomment-649809047 **[Test build #124522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124522/testReport)** for PR 28929 at commit [`3da70ec`](https://github.com/apache/spark/commit/3da70eca7b64938dfdf9dc90198465b3e9103b9c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite
SparkQA commented on pull request #28929: URL: https://github.com/apache/spark/pull/28929#issuecomment-649869826 **[Test build #124522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124522/testReport)** for PR 28929 at commit [`3da70ec`](https://github.com/apache/spark/commit/3da70eca7b64938dfdf9dc90198465b3e9103b9c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445894796 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2656,6 +2656,16 @@ object SQLConf { .checkValue(_ > 0, "The difference must be positive.") .createWithDefault(4) + val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL") + .doc("Time-to-live (TTL) value for the metadata caches: partition file metadata cache and " + +"session catalog cache. This configuration only has an effect when this value having " + +"a positive value. It also requires setting `hive` to " + +s"${StaticSQLConf.CATALOG_IMPLEMENTATION} to be applied to the partition file " + Review comment: More conditions for this option to be enabled? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L43-L44 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
dongjoon-hyun commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649868767 Hi, @srowen , @HyukjinKwon , @gatorsmile , @holdenk , @dbtsai . According to your comments and advices, I updated the PR description clearly and focused on only Apache-side. Can we make Apache Spark 3.1 move forward? Thank you in advance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445893610 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2656,6 +2656,16 @@ object SQLConf { .checkValue(_ > 0, "The difference must be positive.") .createWithDefault(4) + val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL") Review comment: `buildConf` -> `buildStaticConf`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
maropu commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r445884598 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala ## @@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSi for (pruningEnabled <- Seq(true, false)) { testCaching(pruningEnabled) } + + test("cache TTL") { +val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, "1") +val newSession = SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession() + +withSparkSession(newSession) { implicit spark => Review comment: Ah, is this not a runtime config? If so, `SQLConf` -> `StaticSQLConf`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
AmplabJenkins commented on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-649866080 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org