[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649991405


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124525/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649991400







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649991400


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649912241


   **[Test build #124525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)**
 for PR 28898 at commit 
[`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649990865


   **[Test build #124525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)**
 for PR 28898 at commit 
[`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow

2020-06-25 Thread GitBox


HyukjinKwon commented on a change in pull request #28928:
URL: https://github.com/apache/spark/pull/28928#discussion_r445977049



##
File path: python/pyspark/sql/pandas/conversion.py
##
@@ -413,7 +413,7 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
 
 # Slice the DataFrame to be batched
 step = -(-len(pdf) // self.sparkContext.defaultParallelism)  # round 
int up
-pdf_slices = (pdf[start:start + step] for start in xrange(0, len(pdf), 
step))
+pdf_slices = (pdf.iloc[start:start + step] for start in xrange(0, 
len(pdf), step))

Review comment:
   As far as I can tell, yes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow

2020-06-25 Thread GitBox


gatorsmile commented on a change in pull request #28928:
URL: https://github.com/apache/spark/pull/28928#discussion_r445976618



##
File path: python/pyspark/sql/pandas/conversion.py
##
@@ -413,7 +413,7 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
 
 # Slice the DataFrame to be batched
 step = -(-len(pdf) // self.sparkContext.defaultParallelism)  # round 
int up
-pdf_slices = (pdf[start:start + step] for start in xrange(0, len(pdf), 
step))
+pdf_slices = (pdf.iloc[start:start + step] for start in xrange(0, 
len(pdf), step))

Review comment:
   Thank you for fixing this! 
   
   > While standard Python / Numpy expressions for selecting and setting are 
intuitive and come in handy for interactive work, for production code, we 
recommend the optimized pandas data access methods, .at, .iat, .loc and .iloc.
   
   Is it the only place? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649976751


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124527/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649976746


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649976746







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649936846


   **[Test build #124527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)**
 for PR 28898 at commit 
[`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649976637


   **[Test build #124527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)**
 for PR 28898 at commit 
[`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


wypoon commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-649968228


   In the latest update, there are three changes:
   1. `failedEpoch` and `fileLostEpoch` are renamed and comments explaining 
what they are are expanded, largely based on suggestions from @squito.
   2. A call to `clearCacheLocs` is moved into the correct if block in 
`removeExecutorAndUnregisterOutputs`.
   3. In `DAGSchedulerSuite`, `mapOutputTracker` and `blockManagerMaster` are 
wrapped by `Mockito.spy` and the spies are used to verify how many times each 
is called. This verification is added to some existing tests, which pass 
without my change to `DAGScheduler`. The verification is also added to the new 
test case for this bug. Thanks to @attilapiros for his illustrative example 
using `Mockito.spy`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-649963283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-649963283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28805:
URL: https://github.com/apache/spark/pull/28805#issuecomment-649963207







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28805:
URL: https://github.com/apache/spark/pull/28805#issuecomment-649963207







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


wypoon commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r445965785



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -177,6 +177,8 @@ private[spark] class DAGScheduler(
   // TODO: Garbage collect information about failure epochs when we know there 
are no more
   //   stray messages to detect.
   private val failedEpoch = new HashMap[String, Long]
+  // In addition, track epoch for failed executors that result in lost file 
output

Review comment:
   I changed `fileLostEpoch` to `shuffleFileLostEpoch` and more or less 
adopted your suggestion for the comment explaining it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-649962628


   **[Test build #124530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124530/testReport)**
 for PR 28848 at commit 
[`d09ef93`](https://github.com/apache/spark/commit/d09ef9335e5d3657b830497155abb7a0c2bb0cde).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF

2020-06-25 Thread GitBox


SparkQA commented on pull request #28805:
URL: https://github.com/apache/spark/pull/28805#issuecomment-649962618


   **[Test build #124531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124531/testReport)**
 for PR 28805 at commit 
[`270324e`](https://github.com/apache/spark/commit/270324ee306f035352b58e77718d73810f1ffa1f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


wypoon commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r445965319



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -177,6 +177,8 @@ private[spark] class DAGScheduler(
   // TODO: Garbage collect information about failure epochs when we know there 
are no more
   //   stray messages to detect.
   private val failedEpoch = new HashMap[String, Long]

Review comment:
   I changed `failedEpoch` to `executorFailureEpoch` and more or less 
adopted your suggestion for the comment explaining it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649954884







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649954884







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649954378


   **[Test build #124529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124529/testReport)**
 for PR 28898 at commit 
[`652c77f`](https://github.com/apache/spark/commit/652c77fdbbfa468271e783e1492f72f4785c9880).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-649944669







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-649944669







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-25 Thread GitBox


SparkQA commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-649944172


   **[Test build #124528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124528/testReport)**
 for PR 28676 at commit 
[`488e051`](https://github.com/apache/spark/commit/488e051e1a7c21c57b646d9f68df8c48e4717126).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649942950


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124524/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649942879


   **[Test build #124524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)**
 for PR 28898 at commit 
[`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649942946


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649942946







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649907459


   **[Test build #124524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)**
 for PR 28898 at commit 
[`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649937189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649937189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649936846


   **[Test build #124527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124527/testReport)**
 for PR 28898 at commit 
[`ab39d24`](https://github.com/apache/spark/commit/ab39d245660c16c0c11d0a37f73f84f74afd7951).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


frankyin-factual commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445946556



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query1 = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val aliases1 = collectGeneratedAliases(optimized1)
+val expected1 = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases1(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases1(1)}".as(aliases1(0)), $"window")
+  .where($"window" === 1 && $"${aliases1(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized1, expected1)
+  }
+
+  test("Nested field pruning for orderBy") {
+val query1 = contact.select($"name.first", $"name.last")
+  .orderBy($"name.first".asc, $"name.last".asc)
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val aliases1 = collectGeneratedAliases(optimized1)
+val expected1 = contact
+  .select($"name.first",
+$"name.last",
+$"name.first".as(aliases1(0)),
+$"name.last".as(aliases1(1)))
+  .orderBy($"${aliases1(0)}".asc, $"${aliases1(1)}".asc)
+  .select($"first", $"last")
+  .analyze
+comparePlans(optimized1, expected1)
+  }
+
+  test("Nested field pruning for sirtBy") {

Review comment:
   Yeah





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649930413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649924942


   **[Test build #124526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)**
 for PR 28927 at commit 
[`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649930413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


SparkQA commented on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649930269


   **[Test build #124526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)**
 for PR 28927 at commit 
[`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649925471







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649925471







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


SparkQA commented on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649924942


   **[Test build #124526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124526/testReport)**
 for PR 28927 at commit 
[`a0756db`](https://github.com/apache/spark/commit/a0756db9b61e17a2c4cacca90943022d60bcb64a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


HyukjinKwon commented on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649923516


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28927: [SPARK-32099][DOCS] Remove broken link in cloud integration documentation

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28927:
URL: https://github.com/apache/spark/pull/28927#issuecomment-649467834


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


viirya commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445935640



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query1 = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val aliases1 = collectGeneratedAliases(optimized1)
+val expected1 = contact

Review comment:
   If there is only one query, we don't need to name it as `query1`, 
`optimized1`...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


viirya commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445935343



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query1 = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val aliases1 = collectGeneratedAliases(optimized1)
+val expected1 = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases1(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases1(1)}".as(aliases1(0)), $"window")
+  .where($"window" === 1 && $"${aliases1(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized1, expected1)
+  }
+
+  test("Nested field pruning for orderBy") {
+val query1 = contact.select($"name.first", $"name.last")
+  .orderBy($"name.first".asc, $"name.last".asc)
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val aliases1 = collectGeneratedAliases(optimized1)
+val expected1 = contact
+  .select($"name.first",
+$"name.last",
+$"name.first".as(aliases1(0)),
+$"name.last".as(aliases1(1)))
+  .orderBy($"${aliases1(0)}".asc, $"${aliases1(1)}".asc)
+  .select($"first", $"last")
+  .analyze
+comparePlans(optimized1, expected1)
+  }
+
+  test("Nested field pruning for sirtBy") {

Review comment:
   Do you mean sortBy?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


viirya commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445935219



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,14 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+case Project(projectList, Filter(condition, child))

Review comment:
   I think we better leave a few comment explaining this case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649912241


   **[Test build #124525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124525/testReport)**
 for PR 28898 at commit 
[`4c705bd`](https://github.com/apache/spark/commit/4c705bd5e7cbeae2603afe799a338e068c35923c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649909806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649909806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649907836







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649907836







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window/sort functions

2020-06-25 Thread GitBox


SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649907459


   **[Test build #124524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124524/testReport)**
 for PR 28898 at commit 
[`3c8cf11`](https://github.com/apache/spark/commit/3c8cf110b19bc5d0c9e89a8a031e6e4a557aa1b3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-06-25 Thread GitBox


dilipbiswal commented on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-649906157


   @maropu Resolved the conflicts. Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906173


   **[Test build #124523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)**
 for PR 28897 at commit 
[`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649865700


   **[Test build #124523 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)**
 for PR 28897 at commit 
[`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28896: [SPARK-32025][SQL] Csv schema inference problems with different types in the same column

2020-06-25 Thread GitBox


HyukjinKwon closed pull request #28896:
URL: https://github.com/apache/spark/pull/28896


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28896: [SPARK-32025][SQL] Csv schema inference problems with different types in the same column

2020-06-25 Thread GitBox


HyukjinKwon commented on pull request #28896:
URL: https://github.com/apache/spark/pull/28896#issuecomment-649901085


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-649898619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-649898619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-649795495


   **[Test build #124520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124520/testReport)**
 for PR 28425 at commit 
[`7ce28a2`](https://github.com/apache/spark/commit/7ce28a2cd4345f7911d0ef4f681aa8421af22547).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-06-25 Thread GitBox


SparkQA commented on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-649898054


   **[Test build #124520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124520/testReport)**
 for PR 28425 at commit 
[`7ce28a2`](https://github.com/apache/spark/commit/7ce28a2cd4345f7911d0ef4f681aa8421af22547).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Task lauchtime in stage tab not correct

2020-06-25 Thread GitBox


TJX2014 edited a comment on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-649862045


   > According to the following documents, this change seems work with recent 
browsers.
   > 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse
   > https://tc39.es/ecma262/#sec-date-time-string-format
   
   Thanks, @sarutak  I find this change also work with ES6 . 
[https://www.tutorialspoint.com/es6/es6_date.htm](https://www.tutorialspoint.com/es6/es6_date.htm)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

2020-06-25 Thread GitBox


HyukjinKwon commented on a change in pull request #27331:
URL: https://github.com/apache/spark/pull/27331#discussion_r445916122



##
File path: python/pyspark/sql/readwriter.py
##
@@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None):
 self.mode(mode)._jwrite.jdbc(url, table, jprop)
 
 
+class DataFrameWriterV2(object):
+"""
+Interface used to write a class:`pyspark.sql.dataframe.DataFrame`
+to external storage using the v2 API.
+
+.. versionadded:: 3.1.0
+"""
+
+def __init__(self, df, table):
+self._df = df
+self._spark = df.sql_ctx
+self._jwriter = df._jdf.writeTo(table)
+
+@since(3.1)
+def using(self, provider):
+"""
+Specifies a provider for the underlying output data source.
+Spark's default catalog supports "parquet", "json", etc.
+"""
+self._jwriter.using(provider)
+return self
+
+@since(3.1)
+def option(self, key, value):
+"""
+Add a write option.
+"""
+self._jwriter.option(key, to_str(value))
+return self
+
+@since(3.1)
+def options(self, **options):
+"""
+Add write options.
+"""
+options = {k: to_str(v) for k, v in options.items()}
+self._jwriter.options(options)
+return self
+
+@since(3.1)
+def partitionedBy(self, col, *cols):

Review comment:
   @rdblue, I don't mean to we should do that here. I mean to 
suggest/discuss to make the separation in the Scala first because that 
propagates the confusion to PySpark API side as well.
   
   They are different things so I am suggesting to make it different. I hope we 
can more focus on the discussion itself.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

2020-06-25 Thread GitBox


HyukjinKwon commented on a change in pull request #27331:
URL: https://github.com/apache/spark/pull/27331#discussion_r445910093



##
File path: python/pyspark/sql/readwriter.py
##
@@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None):
 self.mode(mode)._jwrite.jdbc(url, table, jprop)
 
 
+class DataFrameWriterV2(object):
+"""
+Interface used to write a class:`pyspark.sql.dataframe.DataFrame`
+to external storage using the v2 API.
+
+.. versionadded:: 3.1.0
+"""
+
+def __init__(self, df, table):
+self._df = df
+self._spark = df.sql_ctx
+self._jwriter = df._jdf.writeTo(table)
+
+@since(3.1)
+def using(self, provider):
+"""
+Specifies a provider for the underlying output data source.
+Spark's default catalog supports "parquet", "json", etc.
+"""
+self._jwriter.using(provider)
+return self
+
+@since(3.1)
+def option(self, key, value):
+"""
+Add a write option.
+"""
+self._jwriter.option(key, to_str(value))
+return self
+
+@since(3.1)
+def options(self, **options):
+"""
+Add write options.
+"""
+options = {k: to_str(v) for k, v in options.items()}
+self._jwriter.options(options)
+return self
+
+@since(3.1)
+def partitionedBy(self, col, *cols):

Review comment:
   @rdblue, I don't mean to we should do that here - this comment doesn't 
block this PR. I mean to suggest/discuss to make the separation in the Scala 
first because that propagates the confusion to PySpark API side as well.
   
   They are different things so I am suggesting to make it different. I hope we 
can more focus on the discussion itself.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28805: [SPARK-28169][SQL] Convert scan predicate condition to CNF

2020-06-25 Thread GitBox


AngersZh commented on a change in pull request #28805:
URL: https://github.com/apache/spark/pull/28805#discussion_r445914791



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
##
@@ -108,4 +109,54 @@ class PruneFileSourcePartitionsSuite extends QueryTest 
with SQLTestUtils with Te
   }
 }
   }
+
+  test("SPARK-28169: Convert scan predicate condition to CNF") {

Review comment:
   > I'm thinking about adding a base test `PartitionPruningSuiteBase` with 
some common test cases. Then we can have a `FileSourcePartitionPruningSuite` 
with file-source specific tests, and `HiveTablePartitionPruningSuite` with 
hive-table specific tests.
   
   Current test in  `FileSourcePartitionPruningSuite ` and  
`HiveTablePartitionPruningSuite ` seems don't have common test,  can you show 
me some point to do these and I will work on this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

2020-06-25 Thread GitBox


HyukjinKwon edited a comment on pull request #27331:
URL: https://github.com/apache/spark/pull/27331#issuecomment-649888157


   > I haven't replied because I don't see how it is an important concern.
   
   @rdblue, I explained multiple times why I think this is relevant and 
important - once you add them, you should fix it in Python and R side too. I 
don't believe all dev people are used to Python and R side given my 
interactions for many years in Spark dev.
   I support to add it for 3.1 but not now in the early stage if it's unstable. 
As I explained earlier, I take this DSv2 case as an exceptional case. See the 
concern about https://github.com/apache/spark/pull/27331#discussion_r445268946 
too.
   
   This isn't a great way to discuss that you ignore because you don't think 
it's important or relevant.
   
   I just wanted to know the rough picture rather than asking you to assert the 
stability here because you are the one who drove DSv2 in the community, and I 
do believe you're the right one to ask. I fully understand the things can 
change.
   
   I am here to help and make progresses here rather than nitpicking or blaming 
on something not done. I fully understand the pain we had at DSv2. It would be 
nicer if we can be more cooperative next time.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

2020-06-25 Thread GitBox


HyukjinKwon commented on pull request #27331:
URL: https://github.com/apache/spark/pull/27331#issuecomment-649888157


   > I haven't replied because I don't see how it is an important concern.
   
   @rdblue, I explained multiple times why I think this is relevant and 
important - once you add them, you should fix it in Python and R side too. I 
don't believe all dev people are used to Python and R side given my 
interactions for many years in Spark dev.
   I support to add it for 3.1 but not now in the early stage. As I explained 
earlier, I take this DSv2 case as an exceptional case. See the concern about 
https://github.com/apache/spark/pull/27331#discussion_r445268946 too.
   
   This isn't a great way to discuss that you ignore because you don't think 
it's important or relevant.
   
   I just wanted to know the rough picture rather than asking you to assert the 
stability here because you are the one who drove DSv2 in the community, and I 
do believe you're the right one to ask. I fully understand the things can 
change.
   
   I am here to help and make progresses here rather than nitpicking or blaming 
on something not done. I fully understand the pain we had at DSv2. It would be 
nicer if we can be more cooperative next time.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

2020-06-25 Thread GitBox


HyukjinKwon commented on a change in pull request #27331:
URL: https://github.com/apache/spark/pull/27331#discussion_r445910093



##
File path: python/pyspark/sql/readwriter.py
##
@@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None):
 self.mode(mode)._jwrite.jdbc(url, table, jprop)
 
 
+class DataFrameWriterV2(object):
+"""
+Interface used to write a class:`pyspark.sql.dataframe.DataFrame`
+to external storage using the v2 API.
+
+.. versionadded:: 3.1.0
+"""
+
+def __init__(self, df, table):
+self._df = df
+self._spark = df.sql_ctx
+self._jwriter = df._jdf.writeTo(table)
+
+@since(3.1)
+def using(self, provider):
+"""
+Specifies a provider for the underlying output data source.
+Spark's default catalog supports "parquet", "json", etc.
+"""
+self._jwriter.using(provider)
+return self
+
+@since(3.1)
+def option(self, key, value):
+"""
+Add a write option.
+"""
+self._jwriter.option(key, to_str(value))
+return self
+
+@since(3.1)
+def options(self, **options):
+"""
+Add write options.
+"""
+options = {k: to_str(v) for k, v in options.items()}
+self._jwriter.options(options)
+return self
+
+@since(3.1)
+def partitionedBy(self, col, *cols):

Review comment:
   @rdblue, I don't mean to we should do that here. I mean to suggest to 
make the separation in the Scala first because that propagates the confusion to 
PySpark API side as well.
   
   They are different things so I am suggesting to make it different. I hope we 
can more focus on the discussion itself.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions

2020-06-25 Thread GitBox


maropu commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649884335


   This is not a bugfix, so we will merge this commit only into master(v3.1.0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions

2020-06-25 Thread GitBox


frankyin-factual commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-649883695


   Also, how likely this will get backported to 2.4.x versions? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28928: [SPARK-32098][PYTHON] Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow

2020-06-25 Thread GitBox


HyukjinKwon commented on pull request #28928:
URL: https://github.com/apache/spark/pull/28928#issuecomment-649883425


   Thank you @BryanCutler and @ueshin!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #26816: [SPARK-30191][YARN] optimize yarn allocator

2020-06-25 Thread GitBox


github-actions[bot] closed pull request #26816:
URL: https://github.com/apache/spark/pull/26816


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #27377: [SPARK-30666][Core][WIP] Reliable single-stage accumulators

2020-06-25 Thread GitBox


github-actions[bot] commented on pull request #27377:
URL: https://github.com/apache/spark/pull/27377#issuecomment-649881487


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #25721: [WIP][SPARK-29018][SQL] Implement Spark Thrift Server with it's own code base on PROTOCOL_VERSION_V9

2020-06-25 Thread GitBox


github-actions[bot] commented on pull request #25721:
URL: https://github.com/apache/spark/pull/25721#issuecomment-649881504


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to PythonUDF.

2020-06-25 Thread GitBox


github-actions[bot] closed pull request #18906:
URL: https://github.com/apache/spark/pull/18906


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #26711: [SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN

2020-06-25 Thread GitBox


github-actions[bot] closed pull request #26711:
URL: https://github.com/apache/spark/pull/26711


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445904606



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala
##
@@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with 
SQLTestUtils with TestHiveSi
   for (pruningEnabled <- Seq(true, false)) {
 testCaching(pruningEnabled)
   }
+
+  test("cache TTL") {
+val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, 
"1")
+val newSession = 
SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession()
+
+withSparkSession(newSession) { implicit spark =>

Review comment:
   Yea, `withSQLConf` is usd only for runtime configs, then we cannot use 
it for static configs. That's an issue of how-to-write-tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445903970



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala
##
@@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with 
SQLTestUtils with TestHiveSi
   for (pruningEnabled <- Seq(true, false)) {
 testCaching(pruningEnabled)
   }
+
+  test("cache TTL") {
+val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, 
"1")
+val newSession = 
SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession()
+
+withSparkSession(newSession) { implicit spark =>

Review comment:
   Its okay to use `buildStaticConf` 
https://github.com/apache/spark/pull/28852#discussion_r445893610





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions

2020-06-25 Thread GitBox


frankyin-factual commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445903857



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -32,7 +32,9 @@ object NestedColumnAliasing {
 
   def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match {
 case Project(projectList, child)
-if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
+if SQLConf.get.nestedSchemaPruningEnabled &&
+  (canProjectPushThrough(child) ||
+getChild(child).exists(canProjectPushThrough)) =>

Review comment:
   Yeah, I will update this PR later tonight. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28912: [SPARK-32057][SQL] ExecuteStatement: cancel and close should not transiently ERROR

2020-06-25 Thread GitBox


maropu commented on pull request #28912:
URL: https://github.com/apache/spark/pull/28912#issuecomment-649877864


   @alismess-db Looks the valid test failures.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445903069



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -32,7 +32,9 @@ object NestedColumnAliasing {
 
   def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match {
 case Project(projectList, child)
-if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
+if SQLConf.get.nestedSchemaPruningEnabled &&
+  (canProjectPushThrough(child) ||
+getChild(child).exists(canProjectPushThrough)) =>

Review comment:
   > How about use my proposal at #28898 (review)?
   
   If we cannot, yea, I think we need special handling for `Filter` as @viirya 
suggested above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow schema pruning thru window functions

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r445902694



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -32,7 +32,9 @@ object NestedColumnAliasing {
 
   def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match {
 case Project(projectList, child)
-if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
+if SQLConf.get.nestedSchemaPruningEnabled &&
+  (canProjectPushThrough(child) ||
+getChild(child).exists(canProjectPushThrough)) =>

Review comment:
   > That won’t work because it seems causing an infinite loop in 
optimizer. It gives me error messages like running out of max iterations.
   >> I see, it is due to predicate pushdown rule.
   
   I don't look into it though, we cannot fix the infinite loop caused by the 
predicate pushdown rule? If we can put `Filter` in `canProjectPushThrough`, it 
looks the best.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-25 Thread GitBox


holdenk commented on pull request #28864:
URL: https://github.com/apache/spark/pull/28864#issuecomment-649877003


   If there are no more comments by EOW I'll merge this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox


wypoon commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-649874411


   > @wypoon if you have not started extending the test with the multiple fetch 
failures case you can use this I you agree with it:
   > 
[attilapiros@be14a51](https://github.com/attilapiros/spark/commit/be14a51ca766711d793d9a7314a2cf030e2acdc7)
   
   @attilapiros thanks for the code; that is very helpful. I had an offline 
chat with @squito, and he had a different test in mind, but in a similar 
spirit. He was thinking of a test to verify that in `DAGScheduler`, 
`blockManagerMaster.removeExecutor` is not called more than once after the 
executor is lost. I can use your approach (using Mockito spy) there as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sap1ens commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


sap1ens commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445898815



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala
##
@@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with 
SQLTestUtils with TestHiveSi
   for (pruningEnabled <- Seq(true, false)) {
 testCaching(pruningEnabled)
   }
+
+  test("cache TTL") {
+val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, 
"1")
+val newSession = 
SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession()
+
+withSparkSession(newSession) { implicit spark =>

Review comment:
   @maropu hmm, how do I use `withSQLConf` with `StaticSQLConf `? It 
doesn't allow it: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala#L50





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445885603



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala
##
@@ -126,4 +131,40 @@ class HiveMetadataCacheSuite extends QueryTest with 
SQLTestUtils with TestHiveSi
   for (pruningEnabled <- Seq(true, false)) {
 testCaching(pruningEnabled)
   }
+
+  test("expire cached metadata if TTL is configured") {
+val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, 
"1")
+val newSession = 
SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession()
+
+withSparkSession(newSession) { implicit spark =>
+  withTable("test_ttl") {
+withTempDir { dir =>
+  spark.sql(s"""
+|CREATE EXTERNAL TABLE test_ttl (id long)
+|PARTITIONED BY (f1 int, f2 int)
+|STORED AS PARQUET
+|LOCATION "${dir.toURI}.stripMargin)

Review comment:
   nit format:
   ```
 spark.sql(
   s"""
  |CREATE EXTERNAL TABLE test_ttl (id long)
  |PARTITIONED BY (f1 int, f2 int)
  |STORED AS PARQUET
  |LOCATION "${dir.toURI}"
""".stripMargin)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite

2020-06-25 Thread GitBox


AmplabJenkins removed a comment on pull request #28929:
URL: https://github.com/apache/spark/pull/28929#issuecomment-649870481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28929:
URL: https://github.com/apache/spark/pull/28929#issuecomment-649870481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445894796



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2656,6 +2656,16 @@ object SQLConf {
   .checkValue(_ > 0, "The difference must be positive.")
   .createWithDefault(4)
 
+  val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL")
+  .doc("Time-to-live (TTL) value for the metadata caches: partition file 
metadata cache and " +
+"session catalog cache. This configuration only has an effect when 
this value having " +
+"a positive value. It also requires setting `hive` to " +
+s"${StaticSQLConf.CATALOG_IMPLEMENTATION} to be applied to the 
partition file " +

Review comment:
   More conditions for this option to be enabled? 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L43-L44
   Since the user's documents are generated based on this description, I think 
it should be clear as much as possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite

2020-06-25 Thread GitBox


SparkQA removed a comment on pull request #28929:
URL: https://github.com/apache/spark/pull/28929#issuecomment-649809047


   **[Test build #124522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124522/testReport)**
 for PR 28929 at commit 
[`3da70ec`](https://github.com/apache/spark/commit/3da70eca7b64938dfdf9dc90198465b3e9103b9c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28929: [SPARK-32100][CORE][TESTS] Add WorkerDecommissionExtendedSuite

2020-06-25 Thread GitBox


SparkQA commented on pull request #28929:
URL: https://github.com/apache/spark/pull/28929#issuecomment-649869826


   **[Test build #124522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124522/testReport)**
 for PR 28929 at commit 
[`3da70ec`](https://github.com/apache/spark/commit/3da70eca7b64938dfdf9dc90198465b3e9103b9c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445894796



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2656,6 +2656,16 @@ object SQLConf {
   .checkValue(_ > 0, "The difference must be positive.")
   .createWithDefault(4)
 
+  val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL")
+  .doc("Time-to-live (TTL) value for the metadata caches: partition file 
metadata cache and " +
+"session catalog cache. This configuration only has an effect when 
this value having " +
+"a positive value. It also requires setting `hive` to " +
+s"${StaticSQLConf.CATALOG_IMPLEMENTATION} to be applied to the 
partition file " +

Review comment:
   More conditions for this option to be enabled? 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L43-L44





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649868767


   Hi, @srowen , @HyukjinKwon , @gatorsmile , @holdenk , @dbtsai .
   According to your comments and advices, I updated the PR description clearly 
and focused on only Apache-side. Can we make Apache Spark 3.1 move forward? 
Thank you in advance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445893610



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2656,6 +2656,16 @@ object SQLConf {
   .checkValue(_ > 0, "The difference must be positive.")
   .createWithDefault(4)
 
+  val METADATA_CACHE_TTL = buildConf("spark.sql.metadataCacheTTL")

Review comment:
   `buildConf` -> `buildStaticConf`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-06-25 Thread GitBox


maropu commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r445884598



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala
##
@@ -126,4 +129,39 @@ class HiveMetadataCacheSuite extends QueryTest with 
SQLTestUtils with TestHiveSi
   for (pruningEnabled <- Seq(true, false)) {
 testCaching(pruningEnabled)
   }
+
+  test("cache TTL") {
+val sparkConfWithTTl = new SparkConf().set(SQLConf.METADATA_CACHE_TTL.key, 
"1")
+val newSession = 
SparkSession.builder.config(sparkConfWithTTl).getOrCreate().cloneSession()
+
+withSparkSession(newSession) { implicit spark =>

Review comment:
   Ah, is this not a runtime config? If so, `SQLConf` -> `StaticSQLConf`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

2020-06-25 Thread GitBox


AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649866080







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >