date:20191216

[GitHub] [spark] SparkQA commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

SparkQA commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350525
 
 
   **[Test build #115419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115419/testReport)**
 for PR 26915 at commit 
[`d540e68`](https://github.com/apache/spark/commit/d540e68facff08b9f8eefba0070c232af1d3c8ed).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox

SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size 
field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350550
 
 
   **[Test build #115420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115420/testReport)**
 for PR 26906 at commit 
[`11b7f71`](https://github.com/apache/spark/commit/11b7f718e53c17e9a7c2946bddcaf8b860562d31).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove 
size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350917
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26915: [INFRA] Reverts commit 56dcd79 and 
c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350861
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20222/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove 
size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350924
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20223/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 opened a new pull request #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases

2019-12-16 Thread GitBox

jiangxb1987 opened a new pull request #26916: [SPARK-25100][TEST][FOLLOWUP] 
Refactor test cases
URL: https://github.com/apache/spark/pull/26916
 
 
   ### What changes were proposed in this pull request?
   
   Refactor test cases added by https://github.com/apache/spark/pull/26714, to 
improve code compactness.
   
   ### How was this patch tested?
   
   Tested locally.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26915: [INFRA] Reverts commit 56dcd79 and 
c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350858
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] 
Remove size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350917
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26915: [INFRA] Reverts commit 56dcd79 
and c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350858
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

wangyum commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350961
 
 
   Sorry @srowen @HyukjinKwon @dongjoon-hyun @cloud-fan 
   It should not add commits to master after 
https://github.com/apache/spark/pull/26879. Such as: 
[3.0.0-preview2-rc1](https://github.com/apache/spark/releases/tag/v3.0.0-preview2-rc1).
   
   But I forgot to check out the latest code. I'm very sorry about that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on a change in pull request #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

jiangxb1987 commented on a change in pull request #26916: 
[SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and 
`KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#discussion_r358569006
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/FileSuite.scala
 ##
 @@ -702,32 +702,39 @@ class FileSuite extends SparkFunSuite with 
LocalSparkContext {
 assert(collectRDDAndDeleteFileBeforeCompute(true).isEmpty)
   }
 
-  test("SPARK-25100: Using KryoSerializer and" +
-  "setting registrationRequired true can lead job failed") {
-val inputFile = new File(tempDir, "/input").getAbsolutePath
-val textFileOutputDir = new File(tempDir, "/out1").getAbsolutePath
-val dataSetDir = new File(tempDir, "/out2").getAbsolutePath
-
-Utils.tryWithResource(new PrintWriter(new File(inputFile))) { writer =>
-  for (i <- 1 to 100) {
+  test("SPARK-25100: Support commit tasks when Kyro registration is required") 
{
+// Prepare the input file
+val inputFilePath = new File(tempDir, "/input").getAbsolutePath
+Utils.tryWithResource(new PrintWriter(new File(inputFilePath))) { writer =>
+  for (i <- 1 to 3) {
 writer.print(i)
 writer.write('\n')
   }
 }
 
-val conf = new SparkConf(false).setMaster("local").
-  set("spark.kryo.registrationRequired", "true").setAppName("test")
-conf.set("spark.serializer", classOf[KryoSerializer].getName)
+// Start a new SparkContext
+val conf = new SparkConf(false)
+  .setMaster("local")
+  .setAppName("test")
+  .set("spark.kryo.registrationRequired", "true")
+  .set("spark.serializer", classOf[KryoSerializer].getName)
+sc = new SparkContext(conf)
+
+// Prepare the input RDD
+val pairRDD = sc.textFile(inputFilePath).map(x => (x, x))
 
+// Test saveAsTextFile()
+val outputFilePath1 = new File(tempDir, "/out1").getAbsolutePath
+pairRDD.saveAsTextFile(outputFilePath1)
+assert(sc.textFile(outputFilePath1).collect() === Array("(1,1)", "(2,2)", 
"(3,3)"))
 
 Review comment:
   We should ensure the content in the output file is correct.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26915: [INFRA] Reverts commit 56dcd79 
and c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566350861
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20222/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] 
Remove size field for interval column cache
URL: https://github.com/apache/spark/pull/26906#issuecomment-566350924
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20223/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on a change in pull request #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

jiangxb1987 commented on a change in pull request #26916: 
[SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and 
`KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#discussion_r358569339
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
 ##
 @@ -363,16 +363,14 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 val conf = new SparkConf(false)
 conf.set(KRYO_REGISTRATION_REQUIRED, true)
 
-val ser = new KryoSerializer(conf).newInstance()
-// In HadoopMapReduceCommitProtocol#commitTask
-val addedAbsPathFiles: mutable.Map[String, String] = mutable.Map()
-addedAbsPathFiles.put("test1", "test1")
-addedAbsPathFiles.put("test2", "test2")
+// HadoopMapReduceCommitProtocol.commitTask() returns a TaskCommitMessage 
containing a complex
+// structure.
 
-val partitionPaths: mutable.Set[String] = mutable.Set()
-partitionPaths.add("test3")
+val ser = new KryoSerializer(conf).newInstance()
+val addedAbsPathFiles = Map("test1" -> "test1", "test2" -> "test2")
+val partitionPaths = Set("test3")
 
-val taskCommitMessage1 = new TaskCommitMessage(addedAbsPathFiles.toMap -> 
partitionPaths.toSet)
+val taskCommitMessage1 = new TaskCommitMessage(addedAbsPathFiles -> 
partitionPaths)
 
 Review comment:
   We only need to test the object returned by 
`HadoopMapReduceCommitProtocol.commitTask()`, thus no need to create a mutable 
map/set then convert it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26733: [SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream

2019-12-16 Thread GitBox

HyukjinKwon commented on a change in pull request #26733: 
[SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream
URL: https://github.com/apache/spark/pull/26733#discussion_r358569445
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
 ##
 @@ -132,6 +132,30 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 this
   }
 
+  def parquet(path: String): StreamingQuery = {
+format("parquet")
+  .option("path", path)
+  .start()
+  }
+
+  def json(path: String): StreamingQuery = {
+format("json")
+  .option("path", path)
+  .start()
+  }
+
+  def csv(path: String): StreamingQuery = {
+format("csv")
+  .option("path", path)
+  .start()
+  }
+
+  def text(path: String): StreamingQuery = {
+format("text")
+  .option("path", path)
+  .start()
+  }
 
 Review comment:
   Yesh


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26733: [SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream

2019-12-16 Thread GitBox

HyukjinKwon commented on a change in pull request #26733: 
[SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream
URL: https://github.com/apache/spark/pull/26733#discussion_r358569519
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
 ##
 @@ -132,6 +132,30 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 this
   }
 
+  def parquet(path: String): StreamingQuery = {
+format("parquet")
+  .option("path", path)
+  .start()
+  }
+
+  def json(path: String): StreamingQuery = {
+format("json")
+  .option("path", path)
+  .start()
+  }
+
+  def csv(path: String): StreamingQuery = {
+format("csv")
+  .option("path", path)
+  .start()
+  }
+
+  def text(path: String): StreamingQuery = {
+format("text")
+  .option("path", path)
+  .start()
+  }
 
 Review comment:
   Yeah, thanks for closing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

jiangxb1987 commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor 
test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566351638
 
 
   cc @deshanxiao @HeartSaVioR @dongjoon-hyun Please take a look when you have 
time, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26733: [SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream

2019-12-16 Thread GitBox

HyukjinKwon commented on a change in pull request #26733: 
[SPARK-30097][SS][SQL] - Add suport for core sinks in writeStream
URL: https://github.com/apache/spark/pull/26733#discussion_r358569445
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
 ##
 @@ -132,6 +132,30 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 this
   }
 
+  def parquet(path: String): StreamingQuery = {
+format("parquet")
+  .option("path", path)
+  .start()
+  }
+
+  def json(path: String): StreamingQuery = {
+format("json")
+  .option("path", path)
+  .start()
+  }
+
+  def csv(path: String): StreamingQuery = {
+format("csv")
+  .option("path", path)
+  .start()
+  }
+
+  def text(path: String): StreamingQuery = {
+format("text")
+  .option("path", path)
+  .start()
+  }
 
 Review comment:
   Yesh


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

SparkQA commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test 
cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566352458
 
 
   **[Test build #115421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115421/testReport)**
 for PR 26916 at commit 
[`559c45c`](https://github.com/apache/spark/commit/559c45c697ff7b207275e6549c6b8f6396a8e32b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor 
test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566352757
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20224/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26916: [SPARK-25100][TEST][FOLLOWUP] 
Refactor test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566352750
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26916: [SPARK-25100][TEST][FOLLOWUP] 
Refactor test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566352757
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20224/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor 
test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566352750
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

xuanyuanking opened a new pull request #26917: [SPARK-30278][SQL][DOC] Update 
Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917
 
 
   ### What changes were proposed in this pull request?
   Update the Spark SQL document menu and join strategy hints.
   
   ### Why are the changes needed?
   - Several new changes in the Spark SQL document didn't change the 
menu-sql.yaml correspondingly.
   - Update the demo code for join strategy hints. 
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Document change only.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

xuanyuanking commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566355307
 
 
   cc @cloud-fan @gatorsmile 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

SparkQA commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL 
document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356040
 
 
   **[Test build #115422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115422/testReport)**
 for PR 26917 at commit 
[`7222fa3`](https://github.com/apache/spark/commit/7222fa3c75d829003068da1b51fe586b31d3e57b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356338
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update 
Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356338
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20225/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update 
Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20225/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

dongjoon-hyun commented on issue #26915: [INFRA] Reverts commit 56dcd79 and 
c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566356920
 
 
   You can merge this, @wangyum .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox

SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566357719
 
 
   **[Test build #115423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115423/testReport)**
 for PR 26838 at commit 
[`7a98ffb`](https://github.com/apache/spark/commit/7a98ffbc780770bfb6454ea72a597b1c0fb168d1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum closed pull request #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

wangyum closed pull request #26915: [INFRA] Reverts commit 56dcd79 and c216ef1
URL: https://github.com/apache/spark/pull/26915
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1

2019-12-16 Thread GitBox

wangyum commented on issue #26915: [INFRA] Reverts commit 56dcd79 and c216ef1
URL: https://github.com/apache/spark/pull/26915#issuecomment-566357813
 
 
   Thank you all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] 
Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566358062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20226/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566358062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20226/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make 
MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566358052
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] 
Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
URL: https://github.com/apache/spark/pull/26838#issuecomment-566358052
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite`

2019-12-16 Thread GitBox

dongjoon-hyun commented on issue #26916: [SPARK-25100][TEST][FOLLOWUP] Refactor 
test cases in `FileSuite` and `KryoSerializerSuite`
URL: https://github.com/apache/spark/pull/26916#issuecomment-566358336
 
 
   Thank you for making a followup and pinging me, @jiangxb1987 !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

SparkQA commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL 
document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566358598
 
 
   **[Test build #115422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115422/testReport)**
 for PR 26917 at commit 
[`7222fa3`](https://github.com/apache/spark/commit/7222fa3c75d829003068da1b51fe586b31d3e57b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566358668
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566358671
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115422/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update 
Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566358668
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

SparkQA removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark 
SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566356040
 
 
   **[Test build #115422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115422/testReport)**
 for PR 26917 at commit 
[`7222fa3`](https://github.com/apache/spark/commit/7222fa3c75d829003068da1b51fe586b31d3e57b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26917: [SPARK-30278][SQL][DOC] Update 
Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#issuecomment-566358671
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115422/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

srowen commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358578200
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ##
 @@ -182,7 +182,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 spark.sql(
   s"""SELECT
  |  key,
- |  percentile_approx(null, 0.5)
+ |  percentile_approx(cast(null as int), 0.5)
 
 Review comment:
   It doesn't make much sense, yeah. I am not sure whether we generally return 
null or an error in a case like this. Following Hive seems like a reasonable 
guide. It could be OK to change behavior if there's a release note attached to 
the JIRA, and if we think it's more bug fix than anything


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #26894: [SPARK-30094][SQL] Apply current namespace for the single-part table name

2019-12-16 Thread GitBox

cloud-fan closed pull request #26894: [SPARK-30094][SQL] Apply current 
namespace for the single-part table name
URL: https://github.com/apache/spark/pull/26894
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26894: [SPARK-30094][SQL] Apply current namespace for the single-part table name

2019-12-16 Thread GitBox

cloud-fan commented on issue #26894: [SPARK-30094][SQL] Apply current namespace 
for the single-part table name
URL: https://github.com/apache/spark/pull/26894#issuecomment-566361014
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358579502
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ##
 @@ -182,7 +182,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 spark.sql(
   s"""SELECT
  |  key,
- |  percentile_approx(null, 0.5)
+ |  percentile_approx(cast(null as int), 0.5)
 
 Review comment:
   Agreed. I'll updated this change in the PR description. And I guess the 
author's original purpose of these tests here are to verify the null handling 
of the non-null types, not `NullType`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] 
Avoid overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358580862
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   Why would it overflow? The type coercion should cast `accuracyExpression` to 
int type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358581552
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   If we specify a accuracy value greater than Int.MaxValue


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #26723: [SPARK-27523][CORE] - Resolve scheme-less event log directory relative to default filesystem

2019-12-16 Thread GitBox

HeartSaVioR commented on issue #26723: [SPARK-27523][CORE] - Resolve 
scheme-less event log directory relative to default filesystem
URL: https://github.com/apache/spark/pull/26723#issuecomment-566364451
 
 
   What about explicitly documenting this behavior in configuration.md? I'm now 
a bit confused which configurations use which one for schema-less directory 
configuration, but it might worth to mention when we won't follow default 
filesystem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays

2019-12-16 Thread GitBox

SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert 
MLlib vectors to dense arrays
URL: https://github.com/apache/spark/pull/26910#issuecomment-566364589
 
 
   **[Test build #115424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115424/testReport)**
 for PR 26910 at commit 
[`e2bb6c0`](https://github.com/apache/spark/commit/e2bb6c098198f611c3b74289cb53be9a1e187de1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to 
convert MLlib vectors to dense arrays
URL: https://github.com/apache/spark/pull/26910#issuecomment-566364920
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20227/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid 
overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358582153
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   I don't mean a overflow exception here, just the value here is not right.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to 
convert MLlib vectors to dense arrays
URL: https://github.com/apache/spark/pull/26910#issuecomment-566364915
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF 
to convert MLlib vectors to dense arrays
URL: https://github.com/apache/spark/pull/26910#issuecomment-566364915
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF 
to convert MLlib vectors to dense arrays
URL: https://github.com/apache/spark/pull/26910#issuecomment-566364920
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20227/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu opened a new pull request #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

maropu opened a new pull request #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918
 
 
   
   
   ### What changes were proposed in this pull request?
   
   This pr intends to support 32 or more grouping attributes for GROUPING_ID. 
In the current master, an integer overflow can occur to compute grouping IDs;
   
https://github.com/apache/spark/blob/e75d9afb2f282ce79c9fd8bce031287739326a4f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L613
   
   For example, the query below generates wrong grouping IDs in the master;
   ```
   
   scala> val numCols = 32 // or, 31
   scala> val cols = (0 until numCols).map { i => s"c$i" }
   scala> sql(s"create table test_$numCols (${cols.map(c => s"$c 
int").mkString(",")}, v int) using parquet")
   scala> val insertVals = (0 until numCols).map { _ => 1 }.mkString(",")
   scala> sql(s"insert into test_$numCols values ($insertVals,3)")
   scala> sql(s"select grouping_id(), sum(v) from test_$numCols group by 
grouping sets ((${cols.mkString(",")}), 
(${cols.init.mkString(",")}))").show(10, false)
   scala> sql(s"drop table test_$numCols")
   
   // numCols = 32
   +-+--+
   |grouping_id()|sum(v)|
   +-+--+
   |0|3 |
   |0|3 | // Wrong Grouping ID
   +-+--+
   
   // numCols = 31
   +-+--+
   |grouping_id()|sum(v)|
   +-+--+
   |0|3 |
   |1|3 |
   +-+--+
   ```
   To fix this issue, this pr generates string grouping IDs instead of integer 
IDs for that case.
   
   ### Why are the changes needed?
   
   To support more cases in `GROUPING_ID`.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added unit tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366273
 
 
   **[Test build #115425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115425/testReport)**
 for PR 26918 at commit 
[`327967e`](https://github.com/apache/spark/commit/327967e22a105386bf43e0fac4e9fa74ec70bd4e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366580
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20228/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366579
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more 
grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366579
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 
or more grouping attributes for GROUPING_ID 
URL: https://github.com/apache/spark/pull/26918#issuecomment-566366580
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20228/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-16 Thread GitBox

JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] 
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r358583953
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala
 ##
 @@ -145,39 +156,36 @@ case class ReduceNumShufflePartitions(conf: SQLConf) 
extends Rule[SparkPlan] {
   distinctNumPreShufflePartitions.length == 1,
   "There should be only one distinct value of the number pre-shuffle 
partitions " +
 "among registered Exchange operator.")
-val numPreShufflePartitions = distinctNumPreShufflePartitions.head
 
 val partitionStartIndices = ArrayBuffer[Int]()
-// The first element of partitionStartIndices is always 0.
-partitionStartIndices += 0
-
-var postShuffleInputSize = 0L
-
-var i = 0
-while (i < numPreShufflePartitions) {
-  // We calculate the total size of ith pre-shuffle partitions from all 
pre-shuffle stages.
-  // Then, we add the total size to postShuffleInputSize.
-  var nextShuffleInputSize = 0L
-  var j = 0
-  while (j < mapOutputStatistics.length) {
-nextShuffleInputSize += mapOutputStatistics(j).bytesByPartitionId(i)
-j += 1
-  }
-
-  // If including the nextShuffleInputSize would exceed the target 
partition size, then start a
-  // new partition.
-  if (i > 0 && postShuffleInputSize + nextShuffleInputSize > 
targetPostShuffleInputSize) {
-partitionStartIndices += i
-// reset postShuffleInputSize.
-postShuffleInputSize = nextShuffleInputSize
-  } else {
-postShuffleInputSize += nextShuffleInputSize
-  }
-
-  i += 1
+val partitionEndIndices = ArrayBuffer[Int]()
+val numPartitions = mapOutputStatistics.map(stats => 
stats.bytesByPartitionId.length).head
+val includedPartitions = (0 until 
numPartitions).filter(!excludedPartitions.contains(_))
+val firstStartIndex = includedPartitions(0)
+partitionStartIndices += firstStartIndex
+var postShuffleInputSize = 
mapOutputStatistics.map(_.bytesByPartitionId(firstStartIndex)).sum
+var i = firstStartIndex
+includedPartitions.filter(_ != firstStartIndex).foreach {
+  nextPartitionIndices =>
+var nextShuffleInputSize =
+  
mapOutputStatistics.map(_.bytesByPartitionId(nextPartitionIndices)).sum
+// If nextPartitionIndices is skewed and omitted, or including
+// the nextShuffleInputSize would exceed the target partition size,
+// then start a new partition.
+if (nextPartitionIndices != i + 1 ||
 
 Review comment:
   The `nextPartitionIndices ` is excluded.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-16 Thread GitBox

JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] 
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r358584117
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala
 ##
 @@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.concurrent.duration.Duration
+
+import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.joins.SortMergeJoinExec
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
+
+case class OptimizeSkewedPartitions(conf: SQLConf) extends Rule[SparkPlan] {
+
+  private val supportedJoinTypes =
+Inner :: Cross :: LeftSemi :: LeftAnti :: LeftOuter :: RightOuter :: Nil
+
+  /**
+   * A partition is considered as a skewed partition if its size is larger 
than the median
+   * partition size * spark.sql.adaptive.skewedPartitionFactor and also larger 
than
+   * spark.sql.adaptive.skewedPartitionSizeThreshold.
+   */
+  private def isSkewed(
+  stats: MapOutputStatistics,
+  partitionId: Int,
+  medianSize: Long): Boolean = {
+val size = stats.bytesByPartitionId(partitionId)
+size > medianSize * 
conf.getConf(SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR) &&
+  size > 
conf.getConf(SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD)
+  }
+
+  private def medianSize(stats: MapOutputStatistics): Long = {
+val numPartitions = stats.bytesByPartitionId.length
+val bytes = stats.bytesByPartitionId.sorted
+if (bytes(numPartitions / 2) > 0) bytes(numPartitions / 2) else 1
+  }
+
+  /**
+   * Get all the map data size for specific reduce partitionId.
+   */
+  def getMapSizeForSpecificPartition(partitionId: Int, shuffleId: Int): 
Array[Long] = {
+val mapOutputTracker = 
SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
+mapOutputTracker.shuffleStatuses.get(shuffleId).
+  get.mapStatuses.map{_.getSizeForBlock(partitionId)}
+  }
+
+  /**
+   * Split the partition into the number of mappers. Each split read data from 
each mapper.
+   */
+  private def estimateMapStartIndices(
+  stage: QueryStageExec,
+  partitionId: Int,
+  medianSize: Long): Array[Int] = {
+val dependency = 
ShuffleQueryStageExec.getShuffleStage(stage).plan.shuffleDependency
+val numMappers = dependency.rdd.partitions.length
+// TODO: split the partition based on the size
+(0 until numMappers).toArray
+  }
+
+  private def getStatistics(queryStage: QueryStageExec): MapOutputStatistics = 
{
+val shuffleStage = ShuffleQueryStageExec.getShuffleStage(queryStage)
+val metrics = shuffleStage.plan.mapOutputStatisticsFuture
+assert(metrics.isCompleted,
+  "ShuffleQueryStageExec should already be ready when executing 
OptimizeSkewedPartitions rule")
+ThreadUtils.awaitResult(metrics, Duration.Zero)
+  }
+
+  /**
+   * Base optimization support check: the join type is supported.
+   * Note that for some join types(like left outer), whether a certain 
partition can be optimized
+   * also depends on the filed isSkewAndSupportsSplit.
+   */
+  private def supportOptimization(
+  joinType: JoinType,
+  leftStage: QueryStageExec,
+  rightStage: QueryStageExec): Boolean = {
+val joinTypeSupported = supportedJoinTypes.contains(joinType)
+val shuffleStageCheck = 
ShuffleQueryStageExec.isShuffleQueryStageExec(leftStage) &&
+  ShuffleQueryStageExec.isShuffleQueryStageExec(rightStage)
+joinTypeSupported && shuffleStageCheck
+  }
+
+  private def supportSplitOnLeftPartition(joinType: JoinType)

[GitHub] [spark] SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log 
files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566367616
 
 
   **[Test build #115414 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115414/testReport)**
 for PR 26416 at commit 
[`23ce9d7`](https://github.com/apache/spark/commit/23ce9d7e8d47205d8bc6f6a8a050965aa00229d8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  case class FilterStatistic(`
 * `class EventLogFileCompactor(`
 * `class FilteredEventLogFileRewriter(`
 * `class CompactedEventLogFileWriter(`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old 
event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566318766
 
 
   **[Test build #115414 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115414/testReport)**
 for PR 26416 at commit 
[`23ce9d7`](https://github.com/apache/spark/commit/23ce9d7e8d47205d8bc6f6a8a050965aa00229d8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact 
old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566368112
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566368114
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115414/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566368112
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact 
old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566368114
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115414/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WangGuangxin commented on a change in pull request #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics

2019-12-16 Thread GitBox

WangGuangxin commented on a change in pull request #26899: [SPARK-28332][SQL] 
Reserve init value -1 only when do min max statistics in SQLMetrics
URL: https://github.com/apache/spark/pull/26899#discussion_r358586175
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
 ##
 @@ -67,7 +67,9 @@ class SQLMetric(val metricType: String, initValue: Long = 
0L) extends Accumulato
 
   def +=(v: Long): Unit = _value += v
 
-  override def value: Long = _value
+  def getRawValue(): Long = _value
+
+  override def value: Long = Math.max(_value, 0)
 
 Review comment:
   What I want to do here is when we want to distinguish an uninitialized 
metric by -1, then we can use `SQLMetric.getRawValue`(It's the same with what 
current SQLMetric does without this PR). Otherwise use `SQLMetrics.value` to 
make sure it returns at least 0 (It's the same with initializing SQLMetric to 
0). It's a bit tricky.
   
   `SQLMetric.getRawValue` was called in `SQLAppStatusListener.onTaskEnd` and 
`SQLMetrics.postDriverMetricUpdates`.  The metric values in these two places 
eventually go to `LiveStageMetrics` used by  
`SQLAppStatusListener.aggregateMetrics` and then aggregated in 
`SQLMetrics.stringValue`. If a SQLMetric is initialized by -1 and it was not 
got updated in executors, then this metric value is -1 and can be filtered by 
logic in `SQLMetrics.stringValue`.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

SparkQA commented on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap 
lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566371745
 
 
   **[Test build #115418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115418/testReport)**
 for PR 26914 at commit 
[`d5a1ec2`](https://github.com/apache/spark/commit/d5a1ec2c78f520792c580a3174d13aace59c7fb2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

SparkQA removed a comment on issue #26914: [SPARK-30274][Core] Avoid 
BytesToBytesMap lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566339574
 
 
   **[Test build #115418 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115418/testReport)**
 for PR 26914 at commit 
[`d5a1ec2`](https://github.com/apache/spark/commit/d5a1ec2c78f520792c580a3174d13aace59c7fb2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26914: [SPARK-30274][Core] Avoid 
BytesToBytesMap lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566372162
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26914: [SPARK-30274][Core] Avoid 
BytesToBytesMap lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566372170
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115418/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26914: [SPARK-30274][Core] Avoid 
BytesToBytesMap lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566372162
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26914: [SPARK-30274][Core] Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26914: [SPARK-30274][Core] Avoid 
BytesToBytesMap lookup hang forever when holding keys reaching max capacity
URL: https://github.com/apache/spark/pull/26914#issuecomment-566372170
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115418/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log 
files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566372685
 
 
   **[Test build #115417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115417/testReport)**
 for PR 26416 at commit 
[`872ffbc`](https://github.com/apache/spark/commit/872ffbc5cd94d48fcafeaf5681eed12af9a87785).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] attilapiros commented on a change in pull request #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

attilapiros commented on a change in pull request #26869: [SPARK-30235][CORE] 
Switching off host local disk reading of shuffle blocks in case of 
useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#discussion_r358589382
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
 ##
 @@ -1097,9 +1097,9 @@ package object config {
 
   private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
 ConfigBuilder("spark.shuffle.readHostLocalDisk.enabled")
-  .doc("If enabled, shuffle blocks requested from those block managers 
which are running on " +
-"the same host are read from the disk directly instead of being 
fetched as remote blocks " +
-"over the network.")
+  .doc("If enabled (and `spark.shuffle.useOldFetchProtocol` is disabled), 
shuffle blocks " +
 
 Review comment:
   Transforming the first part to string interpolation has not been forgotten.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old 
event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566335358
 
 
   **[Test build #115417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115417/testReport)**
 for PR 26416 at commit 
[`872ffbc`](https://github.com/apache/spark/commit/872ffbc5cd94d48fcafeaf5681eed12af9a87785).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host 
local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566372947
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host 
local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566372950
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20229/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching 
off host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566372947
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching 
off host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566372950
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20229/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566373088
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT

2019-12-16 Thread GitBox

cloud-fan closed pull request #26831: [SPARK-30201][SQL] HiveOutputWriter 
standardOI should use ObjectInspectorCopyOption.DEFAULT
URL: https://github.com/apache/spark/pull/26831
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566373092
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115417/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT

2019-12-16 Thread GitBox

cloud-fan commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter 
standardOI should use ObjectInspectorCopyOption.DEFAULT
URL: https://github.com/apache/spark/pull/26831#issuecomment-566373038
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] attilapiros commented on a change in pull request #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

attilapiros commented on a change in pull request #26869: [SPARK-30235][CORE] 
Switching off host local disk reading of shuffle blocks in case of 
useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#discussion_r358589705
 
 

 ##
 File path: docs/sql-migration-guide.md
 ##
 @@ -97,7 +97,7 @@ license: |
 
   - Since Spark 3.0, when Avro files are written with user provided 
non-nullable schema, even the catalyst schema is nullable, Spark is still able 
to write the files. However, Spark will throw runtime NPE if any of the records 
contains null.
 
-  - Since Spark 3.0, we use a new protocol for fetching shuffle blocks, for 
external shuffle service users, we need to upgrade the server correspondingly. 
Otherwise, we'll get the error message `UnsupportedOperationException: 
Unexpected message: FetchShuffleBlocks`. If it is hard to upgrade the shuffle 
service right now, you can still use the old protocol by setting 
`spark.shuffle.useOldFetchProtocol` to `true`.
+  - Since Spark 3.0, we use a new protocol for fetching shuffle blocks, for 
external shuffle service users, we need to upgrade the server correspondingly. 
Otherwise, we'll get the error message `IllegalArgumentException: Unexpected 
message type: `. If it is hard to upgrade the shuffle service right 
now, you can still use the old protocol by setting 
`spark.shuffle.useOldFetchProtocol` to `true`.
 
 Review comment:
   I have moved it to core migration guide and removed the "Since Spark 3.0, " 
as having subsections with the Spark versions. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact 
old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566373092
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115417/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact 
old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-566373088
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile

2019-12-16 Thread GitBox

cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] 
Avoid overflow and match error in ApproximatePercentile
URL: https://github.com/apache/spark/pull/26905#discussion_r358590272
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ##
 @@ -83,32 +83,37 @@ case class ApproximatePercentile(
   }
 
   // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
-  private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int]
-
-  override def inputTypes: Seq[AbstractDataType] = {
-// Support NumericType, DateType and TimestampType since their internal 
types are all numeric,
-// and can be easily cast to double for processing.
-Seq(TypeCollection(NumericType, DateType, TimestampType),
-  TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType)
-  }
+  private lazy val accuracy: Long = 
accuracyExpression.eval().asInstanceOf[Number].longValue()
 
   // Mark as lazy so that percentageExpression is not evaluated during tree 
transformation.
   private lazy val (returnPercentileArray: Boolean, percentages: 
Array[Double]) =
-percentageExpression.eval() match {
-  // Rule ImplicitTypeCasts can cast other numeric types to double
-  case num: Double => (false, Array(num))
-  case arrayData: ArrayData => (true, arrayData.toDoubleArray())
+percentageExpression.dataType match {
+  case DoubleType => (false, 
Array(percentageExpression.eval().asInstanceOf[Double]))
+  case _: NumericType =>
 
 Review comment:
   shall we simply update `inputTypes` to replace `IntegerType` with `LongType`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local 
disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566374349
 
 
   **[Test build #115426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115426/testReport)**
 for PR 26869 at commit 
[`69459e7`](https://github.com/apache/spark/commit/69459e73457650ae43e08ef783b67cdf31b0d46f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes

2019-12-16 Thread GitBox

cloud-fan commented on a change in pull request #26917: [SPARK-30278][SQL][DOC] 
Update Spark SQL document menu for new changes
URL: https://github.com/apache/spark/pull/26917#discussion_r358590782
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -158,7 +155,7 @@ broadcast(spark.table("src")).join(spark.table("records"), 
"key").show()
 {% highlight r %}
 src <- sql("SELECT * FROM src")
 records <- sql("SELECT * FROM records")
-head(join(broadcast(src), records, src$key == records$key))
 
 Review comment:
   did we remove the `broadcast` method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

SparkQA removed a comment on issue #26869: [SPARK-30235][CORE] Switching off 
host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566374349
 
 
   **[Test build #115426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115426/testReport)**
 for PR 26869 at commit 
[`69459e7`](https://github.com/apache/spark/commit/69459e73457650ae43e08ef783b67cdf31b0d46f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local 
disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566375345
 
 
   **[Test build #115426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115426/testReport)**
 for PR 26869 at commit 
[`69459e7`](https://github.com/apache/spark/commit/69459e73457650ae43e08ef783b67cdf31b0d46f).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching 
off host local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566375359
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol

2019-12-16 Thread GitBox

AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host 
local disk reading of shuffle blocks in case of useOldFetchProtocol
URL: https://github.com/apache/spark/pull/26869#issuecomment-566375364
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115426/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 2 3 4 5 6 7 8 9 >

601 - 700 of 864 matches

Mail list logo