[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721198104


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35169/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721196324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721196324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721196293


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35168/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier

2020-11-03 Thread GitBox


imback82 commented on a change in pull request #30229:
URL: https://github.com/apache/spark/pull/30229#discussion_r516749544



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala
##
@@ -75,6 +75,9 @@ case class AnalyzePartitionCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val sessionState = sparkSession.sessionState
+if (sessionState.catalog.getTempView(tableIdent.identifier).isDefined) {
+  throw new AnalysisException("ANALYZE TABLE is not supported on a 
temporary view.")
+}

Review comment:
   OK. What if we want to support the scenario where a temp view is allowed 
but not a permanent view (not this PR)? Do we want something like the following?
   ```scala
   case class UnresolvedTableOrView(
   multipartIdentifier: Seq[String],
   allowTempView: Boolean = true,
   allowPermanentView: Boolean = true) extends LeafNode {
 require(allowTempView || allowPermanentView)
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jkleckner commented on a change in pull request #29496: [SPARK-24266][K8S][2.4] Restart the watcher when we receive a version changed from k8s

2020-11-03 Thread GitBox


jkleckner commented on a change in pull request #29496:
URL: https://github.com/apache/spark/pull/29496#discussion_r516741734



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala
##
@@ -177,4 +188,27 @@ private[k8s] class LoggingPodStatusWatcherImpl(
   private def formatTime(time: String): String = {
 if (time != null ||  time != "") time else "N/A"
   }
+
+  override def watchOrStop(sId: String): Boolean = if (hasCompleted()) {

Review comment:
   @shockdm The 3.0 patch #29533 has now been merged.  Even though we never 
got an answer whether that was a prerequisite, that blockage is now gone.
   
   Would you be willing to rebase and clean up your patch for submission?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #30232: [SPARK-33156][INFRA][2.4] Upgrade GithubAction image from 18.04 to 20.04

2020-11-03 Thread GitBox


HyukjinKwon edited a comment on pull request #30232:
URL: https://github.com/apache/spark/pull/30232#issuecomment-720965482


   BTW, @dongjoon-hyun, seems like Ubuntu 20.04 changed its default Python 
version to Python 3 (?). I think that's why the error message shows a bit weird 
like `b'..`. I saw this symptom before. We deprecated Python 2 in Spark 3 
and added Python 3 supports in the dev scripts too (at SPARK-27889) but 
`branch-2.4` does not have the fix yet IIRC.
   
   Can you try to remove this line 
https://github.com/apache/spark/pull/30232/files#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2R126:
   
   ```diff
 # Yarn has a Python specific test too, for example, YarnClusterSuite.
   - if: contains(matrix.modules, 'yarn') || contains(matrix.modules, 
'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
 with:
   ```
   
   so the builds install Python 2 always? As far as I remember, the last 
installed Python becomes the default `python`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721182558


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35168/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

2020-11-03 Thread GitBox


srowen commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516730762



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
   Indeed, the example in the PR should have c and d equal no matter what, 
ideally, as it shouldn't be evaluated multiple times. But then, being 
deterministic doesn't matter; how would changing the default be the right fix? 
I don't doubt it should be 'fixed' so that even non-deterministic UDFs aren't 
surprisingly reevaluated. Is the point that the current handling for 
non-deterministic UDFs slower, even though it evaluates them just once by 
design? that seems weird.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30229:
URL: https://github.com/apache/spark/pull/30229#discussion_r516728706



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
##
@@ -175,7 +175,10 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
   }.getMessage
   assert(e2.contains("SHOW CREATE TABLE is not supported on a temporary 
view"))
   assertNoSuchTable(s"SHOW PARTITIONS $viewName")
-  assertNoSuchTable(s"ANALYZE TABLE $viewName COMPUTE STATISTICS")
+  val e3 = intercept[AnalysisException] {
+sql(s"ANALYZE TABLE $viewName COMPUTE STATISTICS")
+  }.getMessage
+  assert(e3.contains("ANALYZE TABLE is not supported on a temporary view"))
   assertNoSuchTable(s"ANALYZE TABLE $viewName COMPUTE STATISTICS FOR 
COLUMNS id")

Review comment:
   We can create a separate PR to fix it. I think it should be 
`AnalysisException` with clear message to mention that it's not cached.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30229:
URL: https://github.com/apache/spark/pull/30229#discussion_r516727702



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -280,6 +280,9 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
 case r @ ShowTableProperties(rt: ResolvedTable, propertyKey) =>
   ShowTablePropertiesExec(r.output, rt.table, propertyKey) :: Nil
 
+case AnalyzeTable(_: ResolvedTable, _, _) | AnalyzeColumn(_: 
ResolvedTable, _, _) =>
+  throw new AnalysisException("ANALYZE TABLE is not supported for v2 
tables.")

Review comment:
   shall we fail it earlier in analyzer rules?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30229:
URL: https://github.com/apache/spark/pull/30229#discussion_r516726862



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala
##
@@ -75,6 +75,9 @@ case class AnalyzePartitionCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val sessionState = sparkSession.sessionState
+if (sessionState.catalog.getTempView(tableIdent.identifier).isDefined) {
+  throw new AnalysisException("ANALYZE TABLE is not supported on a 
temporary view.")
+}

Review comment:
   Seems it's not the first time to hit this requirement. How about we add 
`allowTemp: Boolean` in `UnresolvedTableOrView`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #30185: [SPARK-33152][SQL] This PR proposes a new logic to maintain & track constraints which solves the OOM or performance issues in query compilatio

2020-11-03 Thread GitBox


srowen commented on pull request #30185:
URL: https://github.com/apache/spark/pull/30185#issuecomment-721169846


   This still doesn't look right. I'd squash your branch changes and rebase on 
master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


srowen commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721165959


   Hoo boy that is a lot of code to touch. It sounds like eventually it has to 
happen for Scala 3 and avoids some deprecation noise. I'm not against it though 
as usual there is a minor concern about making patches harder to back port from 
3.1 to 3.0 because of merge conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30229:
URL: https://github.com/apache/spark/pull/30229#discussion_r516725745



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -419,17 +410,16 @@ class ResolveSessionCatalog(
   }
   ShowTablesCommand(db, Some(pattern), true, partitionsSpec)
 
-case AnalyzeTableStatement(tbl, partitionSpec, noScan) =>
-  val v1TableName = parseV1Table(tbl, "ANALYZE TABLE")
+// ANALYZE TABLE works on views if the views are cached.

Review comment:
   so it works for temp view as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #30182: [SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort the table will cause the header content to be lost

2020-11-03 Thread GitBox


srowen closed pull request #30182:
URL: https://github.com/apache/spark/pull/30182


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #30182: [SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort the table will cause the header content to be lost

2020-11-03 Thread GitBox


srowen commented on pull request #30182:
URL: https://github.com/apache/spark/pull/30182#issuecomment-721162096


   Merged to master/3.0



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30193: [SPARK-33293][SQL] Refactor WriteToDataSourceV2Exec and reduce code duplication

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30193:
URL: https://github.com/apache/spark/pull/30193#discussion_r516722182



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -479,15 +432,16 @@ object DataWritingSparkTask extends Logging {
   }
 }
 
-private[v2] trait AtomicTableWriteExec extends V2TableWriteExec with 
SupportsV1Write {
+private[v2] trait TableWriteExec extends V2TableWriteExec with SupportsV1Write 
{

Review comment:
   Both work for me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516720258



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
   > actually, shouldn't we set it as false by default? 
   
   The problem is the performance regression caused by changing the default 
value. What we can do now is educating the users harder, like mentioning this 
thing in the Scala/Python UDF doc.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
   > actually, shouldn't we set it as false by default? 
   
   The problem is the performance regression caused by changing the default 
value. What we can do now is educating the users harder, like mentioning this 
thing in the Scala/Python UDF doc page.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721156123


   **[Test build #130568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130568/testReport)**
 for PR 30234 at commit 
[`dfe1a98`](https://github.com/apache/spark/commit/dfe1a9840128faa8857a3aa39bfa96b45aa9f158).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721154544







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] akiyamaneko edited a comment on pull request #30182: [SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort the table will cause the header content to be lost

2020-11-03 Thread GitBox


akiyamaneko edited a comment on pull request #30182:
URL: https://github.com/apache/spark/pull/30182#issuecomment-721154044


   Jenkins test this please
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721154544







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721027459


   **[Test build #130563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130563/testReport)**
 for PR 30151 at commit 
[`291cf8a`](https://github.com/apache/spark/commit/291cf8aadb30edfe391d4a716a3abea01f309dde).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] akiyamaneko commented on pull request #30182: [SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort the table will cause the header content to be lost

2020-11-03 Thread GitBox


akiyamaneko commented on pull request #30182:
URL: https://github.com/apache/spark/pull/30182#issuecomment-721154044


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


SparkQA commented on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721153265


   **[Test build #130563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130563/testReport)**
 for PR 30151 at commit 
[`291cf8a`](https://github.com/apache/spark/commit/291cf8aadb30edfe391d4a716a3abea01f309dde).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #30227: [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-03 Thread GitBox


HyukjinKwon closed pull request #30227:
URL: https://github.com/apache/spark/pull/30227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721144076







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721144076







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


LuciferYang opened a new pull request #30234:
URL: https://github.com/apache/spark/pull/30234


   ### What changes were proposed in this pull request?
   The purpose of this pr is to partial resolve SPARK-33285. 
   
   `Auto-application` is dropped in Scala 3 and deprecated in 2.13 
(https://github.com/scala/scala/pull/8833) now. So we should add the parens 
when accessed empty-paren method.
   
   For example, if definition as follows:
   
   ```
   Class Foo {
  def bar(): Unit = {}
   }
   val foo = new Foo
   ```
   
   `foo.bar()` is  recommend , not `foo.bar`. So the main change of this pr is 
only add the parens where needed.
   
   ### Why are the changes needed?
   eliminate compilation warnings in Scala 2.13
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass the Jenkins or GitHub Action
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721142854







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


LuciferYang commented on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721140762


   cc @srowen this is the first part to fix this compilation warnings, if it's 
too big, I can split multiple small prs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #30234:
URL: https://github.com/apache/spark/pull/30234#issuecomment-721142854


   **[Test build #130567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130567/testReport)**
 for PR 30234 at commit 
[`eb84d5e`](https://github.com/apache/spark/commit/eb84d5ea90967f6dd0fd32047ef1ec9259d12913).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on pull request #30227: [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-03 Thread GitBox


zero323 commented on pull request #30227:
URL: https://github.com/apache/spark/pull/30227#issuecomment-721137978


   Thanks @HyukjinKwon!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30038: [SPARK-33130][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer dialect)

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #30038:
URL: https://github.com/apache/spark/pull/30038#issuecomment-721069806


   **[Test build #130566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130566/testReport)**
 for PR 30038 at commit 
[`449a7ad`](https://github.com/apache/spark/commit/449a7ad2f695d5422a12740b65ab67d75a397551).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721055509







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ScrapCodes commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-03 Thread GitBox


ScrapCodes commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-721070338


   Hi @holdenk and @dongjoon-hyun, 
   
   Hope you are doing good !
   
   Do you have a view around this one way or the other?
   
   Thanks !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #30221: [SPARK-33314][SQL] Avoid dropping rows in Avro reader

2020-11-03 Thread GitBox


gengliangwang commented on pull request #30221:
URL: https://github.com/apache/spark/pull/30221#issuecomment-720397753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #30186: [SPARK-23432][UI] Add executor peak jvm memory metrics in executors page

2020-11-03 Thread GitBox


tgravescs commented on a change in pull request #30186:
URL: https://github.com/apache/spark/pull/30186#discussion_r516080247



##
File path: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html
##
@@ -86,6 +86,22 @@ Executors
 
   Off Heap Storage Memory
+  
+
+  Peak JVM Memory OnHeap / OffHeap
+  
+
+  Peak Execution Memory OnHeap / OffHeap
+  
+

Review comment:
   I think we should expand this to be  "Peak onHeap/ OffHeap memory used 
for storage of data like RDD partitions cached in memory"

##
File path: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html
##
@@ -86,6 +86,22 @@ Executors
 
   Off Heap Storage Memory
+  
+
+  Peak JVM Memory OnHeap / OffHeap
+  
+

Review comment:
   Peak OnHeap/OffHeap memory used for execution. This refers to memory 
used for computation in shuffles, joins, user data structures, etc. See the 
Memory Management Overview documentation for more details.  

##
File path: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html
##
@@ -86,6 +86,22 @@ Executors
 
   Off Heap Storage Memory
+  
+
+  Peak JVM Memory OnHeap / OffHeap
+  
+
+  Peak Execution Memory OnHeap / OffHeap
+  
+
+  Peak Storage Memory OnHeap / OffHeap
+  
+

Review comment:
   Similarly I think we should explain more or point to java docs.
   At least say this is direct byte buffers and mapped are memory-mapped or 
perhaps better would be to point to java.nio:type=BufferPool,name=direct and 
java.nio:type=BufferPool,name=mapped





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #30226: [SPARK-33299][SQL][DOCS] Don't mention schemas in JSON format in docs for `from_json`

2020-11-03 Thread GitBox


dongjoon-hyun closed pull request #30226:
URL: https://github.com/apache/spark/pull/30226


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

2020-11-03 Thread GitBox


HyukjinKwon commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516423241



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
   The fix itself is logically making sense. But my concern is that we set 
this deterministic flag true by default for Python UDFs (and Scala UDFs too). 
When users just use UDFs actually without knowing it (which I believe arguably 
pretty common), they will get the same answer from UDFs users intended to work 
as non-deterministically after this fix.
   
   @gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by 
default? - I am reading 
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
 It's an arbitrary user-defined function so I think it makes sense to have the 
most loose condition.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
 }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
   The fix itself is logically making sense. But my concern is that we set 
this deterministic flag true by default for Python UDFs (and Scala UDFs too). 
When users just use UDFs actually without knowing it (which I believe is 
arguably pretty common), they will get the same answer from UDFs users intended 
to work as non-deterministically after this fix.
   
   @gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by 
default? - I am reading 
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
 It's an arbitrary user-defined function so I think it makes sense to have the 
most loose condition by default.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30227: [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30227:
URL: https://github.com/apache/spark/pull/30227#issuecomment-720494098







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ScrapCodes commented on pull request #30038: [SPARK-33130][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer dialect)

2020-11-03 Thread GitBox


ScrapCodes commented on pull request #30038:
URL: https://github.com/apache/spark/pull/30038#issuecomment-720278520


   Hi @huaxingao and @cloud-fan , This is ready for review !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30223: [SPARK-33306][SQL][FOLLOWUP] Group DateType and TimestampType together in `needsTimeZone()`

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30223:
URL: https://github.com/apache/spark/pull/30223#issuecomment-720382265







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-11-03 Thread GitBox


HeartSaVioR commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-720970588


   @viirya Kindly reminder. @gaborgsomogyi and @xuanyuanking gave +1 - would it 
be good to persuade your call?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-11-03 Thread GitBox


dongjoon-hyun closed pull request #29533:
URL: https://github.com/apache/spark/pull/29533


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30230: [SPARK-33323][SQL] Add query resolved check before convert hive relation

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30230:
URL: https://github.com/apache/spark/pull/30230#issuecomment-720890323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30216: [SPARK-33304][R][SQL] Add from_avro and to_avro functions to SparkR

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30216:
URL: https://github.com/apache/spark/pull/30216#issuecomment-720328716







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you opened a new pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-03 Thread GitBox


ulysses-you opened a new pull request #28647:
URL: https://github.com/apache/spark/pull/28647


   
   
   ### What changes were proposed in this pull request?
   
   At CreateTableLikeCommand, we use the new tblproperties with merge source 
tblproperties.
   
   ### Why are the changes needed?
   
   We should retain the useful tblproperties,  e.g. `parquet.compression`. And 
hive also retain the tblproperties.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Add UT.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30202: [SPARK-33248][SQL][FOLLOWUP] Update migration guide to make clear what behavior changed and make variable names and configurati

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30202:
URL: https://github.com/apache/spark/pull/30202#discussion_r515753866



##
File path: docs/sql-migration-guide.md
##
@@ -52,7 +52,7 @@ license: |
   
   - In Spark 3.1, the `schema_of_json` and `schema_of_csv` functions return 
the schema in the SQL format in which field names are quoted. In Spark 3.0, the 
function returns a catalog string without field quoting and in lower case. 
  
-  - In Spark 3.1, when 
`spark.sql.legacy.transformationPadNullWhenValueLessThenSchema` is true, Spark 
will pad NULL value when script transformation's output value size less then 
schema size in default-serde mode(script transformation with row format of `ROW 
FORMAT DELIMITED`). If false, Spark will keep original behavior to throw 
`ArrayIndexOutOfBoundsException`.
+  - In Spark 3.1, when script transformation output's value size is less then 
schema size in default-serde mode(script transformation with row format of `ROW 
FORMAT DELIMITED`), Spark will pad NUll value to supplement data. In Spark 3.0 
or earlier, Spark will do nothing and throw `ArrayIndexOutOfBoundsException`. 
To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.transformationNotPadNullToSupplementData.enabled` to `true`.

Review comment:
   Yea I'm just saying that we usually don't add migration guide if we make 
something failed to work, as it's not a breaking change and users have nothing 
to do for migration.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30224: [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30224:
URL: https://github.com/apache/spark/pull/30224#issuecomment-720385125







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30231: [SPARK-33156][INFRA][3.0] Upgrade GithubAction image from 18.04 to 20.04

2020-11-03 Thread GitBox


dongjoon-hyun commented on pull request #30231:
URL: https://github.com/apache/spark/pull/30231#issuecomment-720885674







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging partitions

2020-11-03 Thread GitBox


SparkQA commented on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-720725184







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP

2020-11-03 Thread GitBox


cloud-fan commented on a change in pull request #30144:
URL: https://github.com/apache/spark/pull/30144#discussion_r515765822



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3691,6 +3691,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   checkAnswer(sql("SELECT id FROM t WHERE (SELECT true)"), Row(0L))
 }
   }
+
+  test("SPARK-33229: Support GROUP BY use Separate columns and CUBE/ROLLUP") {
+withTable("t") {
+  sql("CREATE TABLE t USING PARQUET AS SELECT id AS a, id AS b, id AS c 
FROM range(1)")
+  checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY CUBE(a, b, 
c)"),
+Row(0, 0, 0, 1) :: Row(0, 0, null, 1) ::
+  Row(0, null, 0, 1) :: Row(0, null, null, 1) ::
+  Row(null, 0, 0, 1) :: Row(null, 0, null, 1) ::
+  Row(null, null, 0, 1) :: Row(null, null, null, 1) :: Nil)
+  checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY a, CUBE(b, 
c)"),

Review comment:
   what's the semantic of it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao removed a comment on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-11-03 Thread GitBox


sunchao removed a comment on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-720671002


   @HyukjinKwon @Obbay2 is there a JIRA for this? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-03 Thread GitBox


HeartSaVioR commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-720900095


   Sigh. Now it has conflicts so I can't go with giving +1 and merging.
   
   @cchighman Could you please fix the conflicts? It would be totally OK if 
you'd like to let me take this over - just wanted to give a full credit of 
yours and accelerate the process of merging. If you can deal with fixing 
conflicts and hopefully also address my minor comments I'll give +1 if tests 
pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26319: [SPARK-29594][SQL] Provide better error message when creating a Dataset from a Sequence of Case class where a field name started with a

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #26319:
URL: https://github.com/apache/spark/pull/26319#issuecomment-720287298







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #26319: [SPARK-29594][SQL] Provide better error message when creating a Dataset from a Sequence of Case class where a field name started with

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #26319:
URL: https://github.com/apache/spark/pull/26319#issuecomment-720206210







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xkrogen commented on pull request #29906: [SPARK-32037][CORE] Rename blacklisting feature

2020-11-03 Thread GitBox


xkrogen commented on pull request #29906:
URL: https://github.com/apache/spark/pull/29906#issuecomment-720575364


   Huge thanks for pushing this through @tgravescs ! It was no small effort!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-720570503







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging par

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-720725184


   **[Test build #130533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130533/testReport)**
 for PR 30164 at commit 
[`0423970`](https://github.com/apache/spark/commit/0423970aaeb992201942ff4e82e64476283ca568).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702872843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


HeartSaVioR commented on a change in pull request #30151:
URL: https://github.com/apache/spark/pull/30151#discussion_r515757520



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##
@@ -126,6 +126,123 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
 
   }
 
+  def generateAggregatedStateOperators(
+  query: StreamingQueryUIData,
+  minBatchTime: Long,
+  maxBatchTime: Long,
+  jsCollector: JsCollector
+): NodeBuffer = {

Review comment:
   nit: we tend to append this line to the last line of the parameter if 
the last line of the parameter is not long.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##
@@ -126,6 +126,123 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
 
   }
 
+  def generateAggregatedStateOperators(
+  query: StreamingQueryUIData,
+  minBatchTime: Long,
+  maxBatchTime: Long,
+  jsCollector: JsCollector
+): NodeBuffer = {
+// This is made sure on caller side but put it here to be defensive
+require(query.lastProgress != null)
+if (query.lastProgress.stateOperators.nonEmpty) {
+  val numRowsTotalData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.numRowsTotal).sum.toDouble))
+  val maxNumRowsTotal = numRowsTotalData.maxBy(_._2)._2
+
+  val numRowsUpdatedData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.numRowsUpdated).sum.toDouble))
+  val maxNumRowsUpdated = numRowsUpdatedData.maxBy(_._2)._2
+
+  val memoryUsedBytesData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.memoryUsedBytes).sum.toDouble))
+  val maxMemoryUsedBytes = memoryUsedBytesData.maxBy(_._2)._2
+
+  val numRowsDroppedByWatermarkData = query.recentProgress
+.map(p => (parseProgressTimestamp(p.timestamp),
+  p.stateOperators.map(_.numRowsDroppedByWatermark).sum.toDouble))
+  val maxNumRowsDroppedByWatermark = 
numRowsDroppedByWatermarkData.maxBy(_._2)._2
+
+  val graphUIDataForNumberTotalRows =
+new GraphUIData(
+  "aggregated-num-total-rows-timeline",
+  "aggregated-num-total-rows-histogram",
+  numRowsTotalData,
+  minBatchTime,
+  maxBatchTime,
+  0,
+  maxNumRowsTotal,
+  "records")
+  graphUIDataForNumberTotalRows.generateDataJs(jsCollector)
+
+  val graphUIDataForNumberUpdatedRows =
+new GraphUIData(
+  "aggregated-num-updated-rows-timeline",

Review comment:
   ditto

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##
@@ -126,6 +126,123 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
 
   }
 
+  def generateAggregatedStateOperators(
+  query: StreamingQueryUIData,
+  minBatchTime: Long,
+  maxBatchTime: Long,
+  jsCollector: JsCollector
+): NodeBuffer = {
+// This is made sure on caller side but put it here to be defensive
+require(query.lastProgress != null)
+if (query.lastProgress.stateOperators.nonEmpty) {
+  val numRowsTotalData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.numRowsTotal).sum.toDouble))
+  val maxNumRowsTotal = numRowsTotalData.maxBy(_._2)._2
+
+  val numRowsUpdatedData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.numRowsUpdated).sum.toDouble))
+  val maxNumRowsUpdated = numRowsUpdatedData.maxBy(_._2)._2
+
+  val memoryUsedBytesData = query.recentProgress.map(p => 
(parseProgressTimestamp(p.timestamp),
+p.stateOperators.map(_.memoryUsedBytes).sum.toDouble))
+  val maxMemoryUsedBytes = memoryUsedBytesData.maxBy(_._2)._2
+
+  val numRowsDroppedByWatermarkData = query.recentProgress
+.map(p => (parseProgressTimestamp(p.timestamp),
+  p.stateOperators.map(_.numRowsDroppedByWatermark).sum.toDouble))
+  val maxNumRowsDroppedByWatermark = 
numRowsDroppedByWatermarkData.maxBy(_._2)._2
+
+  val graphUIDataForNumberTotalRows =
+new GraphUIData(
+  "aggregated-num-total-rows-timeline",

Review comment:
   Let's add `state` in ID of div tag explicitly, so that these can be 
clearly separated with other div tags.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##
@@ -126,6 +126,123 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
 
   }
 
+  def generateAggregatedStateOperators(
+  query: StreamingQueryUIData,
+  minBatchTime: Long,
+  maxBatchTime: Long,
+  

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30009:
URL: https://github.com/apache/spark/pull/30009#issuecomment-720876636







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-720292747







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-03 Thread GitBox


otterc commented on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-720702382


   With the latest commit, all the review comments have been addressed. 
   @Ngone51 @tgravescs @attilapiros @mridulm @Victsm @jiangxb1987 Please take 
another look at the changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30230: [SPARK-33323][SQL] Add query resolved check before convert hive relation

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30230:
URL: https://github.com/apache/spark/pull/30230#issuecomment-720890323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-03 Thread GitBox


Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r516467454



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,966 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  @VisibleForTesting
+  static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. If not present and the 
corresponding merged
+   * shuffle does not 

[GitHub] [spark] SparkQA removed a comment on pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #29082:
URL: https://github.com/apache/spark/pull/29082#issuecomment-720852047







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-11-03 Thread GitBox


HyukjinKwon commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-720243217


   Let me monitor a bit more before reverting it. I just saw it once.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-03 Thread GitBox


Victsm commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r516167838



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/PushBlockStream.java
##
@@ -23,24 +23,31 @@
 import org.apache.spark.network.protocol.Encoders;
 
 // Needed by ScalaDoc. See SPARK-7726
-import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
 
 
 /**
  * Request to push a block to a remote shuffle service to be merged in push 
based shuffle.
  * The remote shuffle service will also include this message when responding 
the push requests.
  */
 public class PushBlockStream extends BlockTransferMessage {
+  public static final String SHUFFLE_PUSH_BLOCK_PREFIX = "shufflePush";
   public final String appId;
-  public final String blockId;
+  public final int shuffleId;
+  public final int mapIndex;
+  public final int reduceId;
   // Similar to the chunkIndex in StreamChunkId, indicating the index of a 
block in a batch of
   // blocks to be pushed.
   public final int index;
+  public final String streamId;

Review comment:
   Where's this streamId used?
   ShuffleId is used later as streamId in toString and decode.
   That seems a bit confusing.

##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
##
@@ -363,4 +363,38 @@ public boolean useOldFetchProtocol() {
 return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false);
   }
 
+  /**
+   * Class name of the implementation of MergedShuffleFileManager that merges 
the blocks
+   * pushed to it when push-based shuffle is enabled. Default implementation 
for merging the blocks
+   * remotely is 'org.apache.spark.network.shuffle.RemoteBlockPushResolver'.
+   * To turn off push-based shuffle at a cluster level, set the configuration 
to
+   * 
'org.apache.spark.network.shuffle.ExternalBlockHandler$NoOpMergedShuffleFileManager'.
+   */
+  public String mergeShuffleFileManagerImpl() {
+return conf.get("spark.shuffle.push.based.mergedShuffleFileManagerImpl",
+  "org.apache.spark.network.shuffle.RemoteBlockPushResolver");

Review comment:
   Should the default be NoOpMergedShuffleFileManager instead?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30227: [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-03 Thread GitBox


HyukjinKwon commented on pull request #30227:
URL: https://github.com/apache/spark/pull/30227#issuecomment-720813038







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30232: [SPARK-33156][INFRA][2.4] Upgrade GithubAction image from 18.04 to 20.04

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30232:
URL: https://github.com/apache/spark/pull/30232#issuecomment-720895587







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bozhang2820 opened a new pull request #30224: [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing

2020-11-03 Thread GitBox


bozhang2820 opened a new pull request #30224:
URL: https://github.com/apache/spark/pull/30224


   ### What changes were proposed in this pull request?
   This change is to support user provided nullable Avro schema for data with 
non-nullable catalyst schema in Avro writing. 
   
   Without this change, when users try to use a nullable Avro schema to write 
data with a non-nullable catalyst schema, it will throw an 
`IncompatibleSchemaException` with a message like `Cannot convert Catalyst type 
StringType to Avro type ["null","string"]`. With this change it will assume 
that the data is non-nullable, log a warning message for the nullability 
difference and serialize the data to Avro format with the nullable Avro schema 
provided.
   
   ### Why are the changes needed?
   This change is needed because sometimes our users do not have full control 
over the nullability of the Avro schemas they use, and this change provides 
them with the flexibility.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Users are allowed to use nullable Avro schemas for data with 
non-nullable catalyst schemas in Avro writing after the change.
   
   ### How was this patch tested?
   Added unit tests.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30221: [SPARK-33314][SQL] Avoid dropping rows in Avro reader

2020-11-03 Thread GitBox


SparkQA commented on pull request #30221:
URL: https://github.com/apache/spark/pull/30221#issuecomment-720879566







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-03 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r516175694



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/PushBlockStream.java
##
@@ -23,24 +23,31 @@
 import org.apache.spark.network.protocol.Encoders;
 
 // Needed by ScalaDoc. See SPARK-7726
-import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
 
 
 /**
  * Request to push a block to a remote shuffle service to be merged in push 
based shuffle.
  * The remote shuffle service will also include this message when responding 
the push requests.
  */
 public class PushBlockStream extends BlockTransferMessage {
+  public static final String SHUFFLE_PUSH_BLOCK_PREFIX = "shufflePush";
   public final String appId;
-  public final String blockId;
+  public final int shuffleId;
+  public final int mapIndex;
+  public final int reduceId;
   // Similar to the chunkIndex in StreamChunkId, indicating the index of a 
block in a batch of
   // blocks to be pushed.
   public final int index;
+  public final String streamId;

Review comment:
   The `streamId` is used for `StreamCallbackWithId.getID()` . The usage of 
the name `streamId` in `toString` and `decode` is a typo. 

##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
##
@@ -363,4 +363,38 @@ public boolean useOldFetchProtocol() {
 return conf.getBoolean("spark.shuffle.useOldFetchProtocol", false);
   }
 
+  /**
+   * Class name of the implementation of MergedShuffleFileManager that merges 
the blocks
+   * pushed to it when push-based shuffle is enabled. Default implementation 
for merging the blocks
+   * remotely is 'org.apache.spark.network.shuffle.RemoteBlockPushResolver'.
+   * To turn off push-based shuffle at a cluster level, set the configuration 
to
+   * 
'org.apache.spark.network.shuffle.ExternalBlockHandler$NoOpMergedShuffleFileManager'.
+   */
+  public String mergeShuffleFileManagerImpl() {
+return conf.get("spark.shuffle.push.based.mergedShuffleFileManagerImpl",
+  "org.apache.spark.network.shuffle.RemoteBlockPushResolver");

Review comment:
   Sure. I will change the default to be `NoOpMergedShuffleFileManager`. 
   cc. @Ngone51 @tgravescs 

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/PushBlockStream.java
##
@@ -23,24 +23,31 @@
 import org.apache.spark.network.protocol.Encoders;
 
 // Needed by ScalaDoc. See SPARK-7726
-import static 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type;
 
 
 /**
  * Request to push a block to a remote shuffle service to be merged in push 
based shuffle.
  * The remote shuffle service will also include this message when responding 
the push requests.
  */
 public class PushBlockStream extends BlockTransferMessage {
+  public static final String SHUFFLE_PUSH_BLOCK_PREFIX = "shufflePush";
   public final String appId;
-  public final String blockId;
+  public final int shuffleId;
+  public final int mapIndex;
+  public final int reduceId;
   // Similar to the chunkIndex in StreamChunkId, indicating the index of a 
block in a batch of
   // blocks to be pushed.
   public final int index;
+  public final String streamId;

Review comment:
   This is fixed.

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,883 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import 

[GitHub] [spark] zero323 commented on a change in pull request #30216: [SPARK-33304][R][SQL] Add from_avro and to_avro functions to SparkR

2020-11-03 Thread GitBox


zero323 commented on a change in pull request #30216:
URL: https://github.com/apache/spark/pull/30216#discussion_r515789891



##
File path: R/pkg/DESCRIPTION
##
@@ -42,6 +42,7 @@ Collate:
 'context.R'
 'deserialize.R'
 'functions.R'
+'functions_avro.R'

Review comment:
   I thought it might be better to have in a separate script, as it is 
external after all. 
   
   Not to mention `functions` are already huge and painful to navigate, even 
without adding new `family` there. Honestly, I'd prefer to split the whole 
`functions.R` into families (`functions_aggregate.R`, `functions_datettime.R`, 
...).
   
   

##
File path: R/pkg/R/functions_avro.R
##
@@ -0,0 +1,117 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#' Avro processing functions for Column operations
+#'
+#' Avro processing functions defined for \code{Column}.
+#'
+#' @param x Column to compute on.
+#' @param jsonFormatSchema character Avro schema in JSON string format
+#' @param ... additional argument(s) passed as parser options.
+#' @name column_avro_functions
+#' @rdname column_avro_functions
+#' @family avro functions
+#' @note Avro is built-in but external data source module since Spark 2.4.
+#'   Please deploy the application as per the deployment section of "Apache 
Avro Data Source Guide".
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(iris)
+#' schema <- paste(
+#'   c(
+#' '{"type": "record", "namespace": "example.avro", "name": "Iris", 
"fields": [',
+#' '{"type": ["double", "null"], "name": "Sepal_Length"},',
+#' '{"type": ["double", "null"], "name": "Sepal_Width"},',
+#' '{"type": ["double", "null"], "name": "Petal_Length"},',
+#' '{"type": ["double", "null"], "name": "Petal_Width"},',
+#' '{"type": ["string", "null"], "name": "Species"}]}'
+#'   ),
+#'   collapse="\\n"
+#' )
+#'
+#' df_serialized <- select(
+#'   df,
+#'   alias(to_avro(alias(struct(column("*")), "fields")), "payload")
+#' )
+#'
+#' df_deserialized <- select(
+#'   df_serialized,
+#'   from_avro(df_serialized$payload, schema)
+#' )
+#'
+#' head(df_deserialized)
+#' }
+NULL
+
+#' @include generics.R column.R
+NULL
+
+#' @details
+#' \code{from_avro} Converts a binary column of Avro format into its 
corresponding catalyst value.
+#' The specified schema must match the read data, otherwise the behavior is 
undefined:
+#' it may fail or return arbitrary result.
+#' To deserialize the data with a compatible and evolved schema, the expected 
Avro schema can be
+#' set via the option avroSchema.

Review comment:
   There's is note in the `family` section:
   
   
https://github.com/apache/spark/blob/1e9545f69100394b038c502c643624d235fd085e/R/pkg/R/functions_avro.R#L28-L29
   
   and I'll add an actual link to docs.
   
   
   Do you think we should duplicate this note for each function?

##
File path: R/pkg/R/functions_avro.R
##
@@ -0,0 +1,117 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#' Avro processing functions for Column operations
+#'
+#' Avro processing functions defined for \code{Column}.
+#'
+#' @param x Column to compute on.
+#' @param jsonFormatSchema character Avro schema in JSON string format
+#' @param ... additional argument(s) passed as parser options.
+#' @name column_avro_functions
+#' @rdname column_avro_functions
+#' @family avro functions
+#' @note Avro is built-in but external data source module since Spark 2.4.
+#'   Please deploy the application as per 

[GitHub] [spark] zero323 opened a new pull request #30227: [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-03 Thread GitBox


zero323 opened a new pull request #30227:
URL: https://github.com/apache/spark/pull/30227


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR adds support for passing `Column`s as input to PySpark sorting 
functions.
   
   ### Why are the changes needed?
   
   
   According to SPARK-26979, PySpark functions should support both Column and 
str arguments, when possible.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   PySpark users can now provide both `Column` and `str` as an argument for 
`asc*` and `desc*` functions.
   
   ### How was this patch tested?
   
   
   New unit tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

2020-11-03 Thread GitBox


zhengruifeng commented on pull request #30009:
URL: https://github.com/apache/spark/pull/30009#issuecomment-720851384


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30038: [SPARK-33130][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer dialect)

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30038:
URL: https://github.com/apache/spark/pull/30038#issuecomment-720294931







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26319: [SPARK-29594][SQL] Provide better error message when creating a Dataset from a Sequence of Case class where a field name starte

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #26319:
URL: https://github.com/apache/spark/pull/26319#issuecomment-720287298







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30233: [SPARK-33324][K8S][BUILD] Upgrade kubernetes-client to 4.11.1

2020-11-03 Thread GitBox


dongjoon-hyun commented on pull request #30233:
URL: https://github.com/apache/spark/pull/30233#issuecomment-720928756


   Thank you, @HyukjinKwon . Merged to master for Apache Spark 3.1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-03 Thread GitBox


attilapiros commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r516643629



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,966 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  @VisibleForTesting
+  static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. If not present and the 
corresponding merged
+   * shuffle does not 

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30221: [SPARK-33314][SQL] Avoid dropping rows in Avro reader

2020-11-03 Thread GitBox


HeartSaVioR commented on a change in pull request #30221:
URL: https://github.com/apache/spark/pull/30221#discussion_r515822838



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
 protected val stopPosition: Long
 
 private[this] var completed = false
+private[this] var interveningNext = true
+private[this] var prevHasNextRow = false
 private[this] var currentRow: Option[InternalRow] = None
 
 def hasNextRow: Boolean = {
+  if (!interveningNext) {
+// until a row is consumed, return previous result of hasNextRow
+return prevHasNextRow
+  }

Review comment:
   I also feel @viirya 's suggestion would be simpler.
   
   In addition, looks like the implementation didn't respect the Iterator's 
contracts - calling `hasNextRow` shouldn't be prerequisite to call `nextRow`.
   
   Below code would fix the original issue (as this code passes the new test), 
as well as it would also work for the case only `nextRow` is called with 
handling NoSuchElementException.
   
   ```
   def hasNextRow: Boolean = {
 if (currentRow.isDefined) {
   return true
 }
   
 while (!completed && currentRow.isEmpty) {
   val r = fileReader.hasNext && !fileReader.pastSync(stopPosition)
   if (!r) {
 fileReader.close()
 completed = true
 currentRow = None
   } else {
 val record = fileReader.next()
 currentRow = 
deserializer.deserialize(record).asInstanceOf[Option[InternalRow]]
   }
 }
   
 currentRow.isDefined
   }
   
   def nextRow: InternalRow = {
 if (currentRow.isEmpty) {
   if (!hasNextRow) {
 throw new NoSuchElementException("next on empty iterator")
   }
 }
   
 val row = currentRow.get
 currentRow = None
 row
   }
   ```
   

##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
 protected val stopPosition: Long
 
 private[this] var completed = false
+private[this] var interveningNext = true
+private[this] var prevHasNextRow = false
 private[this] var currentRow: Option[InternalRow] = None
 
 def hasNextRow: Boolean = {
+  if (!interveningNext) {
+// until a row is consumed, return previous result of hasNextRow
+return prevHasNextRow
+  }

Review comment:
   I also feel @viirya 's suggestion would be simpler.
   
   In addition, looks like the implementation didn't respect the Iterator's 
contracts - calling `hasNextRow` shouldn't be prerequisite to call `nextRow`.
   
   Below code would fix the original issue (as this code passes the new test), 
as well as it would also work for the case only `nextRow` is called with 
handling NoSuchElementException.
   
   ```
   def hasNextRow: Boolean = {
 if (currentRow.isDefined) {
   return true
 }
   
 while (!completed && currentRow.isEmpty) {
   val r = fileReader.hasNext && !fileReader.pastSync(stopPosition)
   if (!r) {
 fileReader.close()
 completed = true
 currentRow = None
   } else {
 val record = fileReader.next()
 currentRow = 
deserializer.deserialize(record).asInstanceOf[Option[InternalRow]]
   }
 }
   
 currentRow.isDefined
   }
   
   def nextRow: InternalRow = {
 if (currentRow.isEmpty) {
   if (!hasNextRow) {
 throw new NoSuchElementException("next on empty iterator")
   }
 }
   
 val row = currentRow.get
 currentRow = None
 row
   }
   ```
   

##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##
@@ -171,9 +171,15 @@ private[sql] object AvroUtils extends Logging {
 protected val stopPosition: Long
 
 private[this] var completed = false
+private[this] var interveningNext = true
+private[this] var prevHasNextRow = false
 private[this] var currentRow: Option[InternalRow] = None
 
 def hasNextRow: Boolean = {
+  if (!interveningNext) {
+// until a row is consumed, return previous result of hasNextRow
+return prevHasNextRow
+  }

Review comment:
   I also feel @viirya 's suggestion would be simpler.
   
   In addition, looks like the implementation didn't respect the Iterator's 
contracts - calling `hasNextRow` shouldn't be prerequisite to call `nextRow`.
   
   Below code would fix the original issue (as this code passes the new test), 
as well as it would also work for the case which only calls `nextRow` with 
handling `NoSuchElementException`.
   
   ```
   def hasNextRow: Boolean = 

[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-720292747







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #30232: [SPARK-33156][INFRA][2.4] Upgrade GithubAction image from 18.04 to 20.04

2020-11-03 Thread GitBox


dongjoon-hyun opened a new pull request #30232:
URL: https://github.com/apache/spark/pull/30232


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade `Github Action` runner image from `Ubuntu 18.04 
(LTS)` to `Ubuntu 20.04 (LTS)`.
   
   ### Why are the changes needed?
   
   `ubuntu-latest` in `GitHub Action` is still `Ubuntu 18.04 (LTS)`.
   - https://github.com/actions/virtual-environments#available-environments
   
   This upgrade will prepare AmbLab Jenkins upgrade.
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the `Github Action` in this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-03 Thread GitBox


HeartSaVioR edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-720876379


   No I don't have real case for knowing and taking the risk. Probably I could 
create some query which could evade the issue, but I agree that's more likely 
in theory and not real case.
   
   Saying again I don't object the change. If you look back my proposal then 
you'll find blocking the query is also one of options in my proposal. My point 
was that such change warrants the discussion, ideally in dev@ mailing list 
instead of PR. We should avoid making an important decision in closed group.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30151: [SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-11-03 Thread GitBox


SparkQA commented on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-721027459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30225: [SPARK-33187][SQL] Add a check on the number of returned metastore pa…

2020-11-03 Thread GitBox


AmplabJenkins commented on pull request #30225:
URL: https://github.com/apache/spark/pull/30225#issuecomment-720385116







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-03 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-720876379


   No I don't have real case for knowing and taking the risk. Probably I could 
create some query which could evade the issue, but I agree that's more likely 
in theory and not real case.
   
   Saying again I don't object the change. If you look back my proposal then 
you'll find blocking the query is also one of options in my proposal. My point 
was that such change warrants the discussion, ideally in dev@ mailing list 
instead of PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30200: [SPARK-33027][SQL] Add DisableUnnecessaryBucketedScan rule to AQE

2020-11-03 Thread GitBox


cloud-fan closed pull request #30200:
URL: https://github.com/apache/spark/pull/30200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on pull request #30204: [SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for Kubernetes

2020-11-03 Thread GitBox


tgravescs commented on pull request #30204:
URL: https://github.com/apache/spark/pull/30204#issuecomment-720557444


   seems like random test failures happening. I tried the recent 
YarnClusterSuite locally and it works fine. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP

2020-11-03 Thread GitBox


AngersZh commented on a change in pull request #30144:
URL: https://github.com/apache/spark/pull/30144#discussion_r515926187



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3691,6 +3691,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   checkAnswer(sql("SELECT id FROM t WHERE (SELECT true)"), Row(0L))
 }
   }
+
+  test("SPARK-33229: Support GROUP BY use Separate columns and CUBE/ROLLUP") {
+withTable("t") {
+  sql("CREATE TABLE t USING PARQUET AS SELECT id AS a, id AS b, id AS c 
FROM range(1)")
+  checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY CUBE(a, b, 
c)"),
+Row(0, 0, 0, 1) :: Row(0, 0, null, 1) ::
+  Row(0, null, 0, 1) :: Row(0, null, null, 1) ::
+  Row(null, 0, 0, 1) :: Row(null, 0, null, 1) ::
+  Row(null, null, 0, 1) :: Row(null, null, null, 1) :: Nil)
+  checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY a, CUBE(b, 
c)"),

Review comment:
   > what's the semantic of it?
   
   If we want some dimensional analysis group by `a` and  different dimensional 
about combine `b` & `c`, in current we need to write 
   `group by cube(a, b, c)` and `where a !=NULL` to remove interfering data, 
with this patch we can just write 
   
   ```
   group by a, cube(b, c)
   ```
   And this set of PR can make  Grouping Analytics more flexible as Postgres 
SQL.  And we do have this need for analysis。
   
   

##
File path: sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql
##
@@ -59,4 +59,12 @@ SELECT course, year FROM courseSales GROUP BY CUBE(course, 
year) ORDER BY groupi
 -- Aliases in SELECT could be used in ROLLUP/CUBE/GROUPING SETS
 SELECT a + b AS k1, b AS k2, SUM(a - b) FROM testData GROUP BY CUBE(k1, k2);
 SELECT a + b AS k, b, SUM(a - b) FROM testData GROUP BY ROLLUP(k, b);
-SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING 
SETS(k)
+SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING 
SETS(k);
+
+-- GROUP BY use mixed Separate columns and CUBE/ROLLUP
+SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b);
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b) GROUPING 
SETS(a, b);

Review comment:
   This is not unsupported, but it can be fixed easy after refactor 
GROUPING ANALYTICS





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kiszk commented on pull request #26319: [SPARK-29594][SQL] Provide better error message when creating a Dataset from a Sequence of Case class where a field name started with a number

2020-11-03 Thread GitBox


kiszk commented on pull request #26319:
URL: https://github.com/apache/spark/pull/26319#issuecomment-720634523







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP

2020-11-03 Thread GitBox


SparkQA removed a comment on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-720433951


   **[Test build #130526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130526/testReport)**
 for PR 30144 at commit 
[`c1c551c`](https://github.com/apache/spark/commit/c1c551c6f9656ec71d6da03fb8fd4c4119d66c3a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30230: [SPARK-33323][SQL] Add query resolved check before convert hive relation

2020-11-03 Thread GitBox


SparkQA commented on pull request #30230:
URL: https://github.com/apache/spark/pull/30230#issuecomment-720875095







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30038: [SPARK-33130][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer diale

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30038:
URL: https://github.com/apache/spark/pull/30038#issuecomment-720294931







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30186: [SPARK-23432][UI] Add executor peak jvm memory metrics in executors page

2020-11-03 Thread GitBox


AmplabJenkins removed a comment on pull request #30186:
URL: https://github.com/apache/spark/pull/30186#issuecomment-720874715







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30216: [SPARK-33304][R][SQL] Add from_avro and to_avro functions to SparkR

2020-11-03 Thread GitBox


HyukjinKwon commented on a change in pull request #30216:
URL: https://github.com/apache/spark/pull/30216#discussion_r515852857



##
File path: R/pkg/R/functions_avro.R
##
@@ -0,0 +1,117 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#' Avro processing functions for Column operations
+#'
+#' Avro processing functions defined for \code{Column}.
+#'
+#' @param x Column to compute on.
+#' @param jsonFormatSchema character Avro schema in JSON string format
+#' @param ... additional argument(s) passed as parser options.
+#' @name column_avro_functions
+#' @rdname column_avro_functions
+#' @family avro functions
+#' @note Avro is built-in but external data source module since Spark 2.4.
+#'   Please deploy the application as per the deployment section of "Apache 
Avro Data Source Guide".
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(iris)
+#' schema <- paste(
+#'   c(
+#' '{"type": "record", "namespace": "example.avro", "name": "Iris", 
"fields": [',
+#' '{"type": ["double", "null"], "name": "Sepal_Length"},',
+#' '{"type": ["double", "null"], "name": "Sepal_Width"},',
+#' '{"type": ["double", "null"], "name": "Petal_Length"},',
+#' '{"type": ["double", "null"], "name": "Petal_Width"},',
+#' '{"type": ["string", "null"], "name": "Species"}]}'
+#'   ),
+#'   collapse="\\n"
+#' )
+#'
+#' df_serialized <- select(
+#'   df,
+#'   alias(to_avro(alias(struct(column("*")), "fields")), "payload")
+#' )
+#'
+#' df_deserialized <- select(
+#'   df_serialized,
+#'   from_avro(df_serialized$payload, schema)
+#' )
+#'
+#' head(df_deserialized)
+#' }
+NULL
+
+#' @include generics.R column.R
+NULL
+
+#' @details
+#' \code{from_avro} Converts a binary column of Avro format into its 
corresponding catalyst value.
+#' The specified schema must match the read data, otherwise the behavior is 
undefined:
+#' it may fail or return arbitrary result.
+#' To deserialize the data with a compatible and evolved schema, the expected 
Avro schema can be
+#' set via the option avroSchema.

Review comment:
   Oh no. I think that's fine but we'll have to add the R example in the 
link I pointed out.

##
File path: R/pkg/DESCRIPTION
##
@@ -42,6 +42,7 @@ Collate:
 'context.R'
 'deserialize.R'
 'functions.R'
+'functions_avro.R'

Review comment:
   Separating them sounds like an idea, yes. But for now I think we should 
just put it in the same place; otherwise, it would make more sense to place ML 
ones separately to follow the structure in PySpark side. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30228: [SPARK-33319][SQL][TEST] Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-03 Thread GitBox


SparkQA commented on pull request #30228:
URL: https://github.com/apache/spark/pull/30228#issuecomment-720515948







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >