[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...

2018-12-06 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/23228
  
cc @JoshRosen  @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23249
  
**[Test build #99814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99814/testReport)**
 for PR 23249 at commit 
[`130bc95`](https://github.com/apache/spark/commit/130bc957b6d8cf1b8b6c1cf77c57eb759fcc6aa6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99815/testReport)**
 for PR 22683 at commit 
[`9dff9ee`](https://github.com/apache/spark/commit/9dff9eea09cbf3d5298bd6d261e1595cafaaae69).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5846/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23239
  
The change looks fine.
Do we already have tests for cases 2 and 4?  We know test for case 3 is 
[here](https://github.com/apache/spark/pull/23043).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22707: [SPARK-25717][SQL] Insert overwrite a recreated external...

2018-12-06 Thread fjh100456
Github user fjh100456 commented on the issue:

https://github.com/apache/spark/pull/22707
  
Is there any more suggestions? @wangyum @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99813/testReport)**
 for PR 22683 at commit 
[`bf150fb`](https://github.com/apache/spark/commit/bf150fb4bbc68627d19521a31a0d3a294d079862).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23249
  
**[Test build #99812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99812/testReport)**
 for PR 23249 at commit 
[`04be19e`](https://github.com/apache/spark/commit/04be19e62caa8fd0365b4998e22cdcad846be6b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99801/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5845/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)**
 for PR 23245 at commit 
[`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ArrowCollectSerializer(Serializer):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99809/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23239
  
**[Test build #99801 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)**
 for PR 23239 at commit 
[`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99804/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)**
 for PR 23108 at commit 
[`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread wangjiaochun
Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239711919
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

That's it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99802/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23251
  
**[Test build #99802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)**
 for PR 23251 at commit 
[`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23238
  
Thank you for adding this to the migration doc.
cc @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23238#discussion_r239708569
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -141,6 +141,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row. To restore the previous behavior, set 
`spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`.
 
+  - In version 2.3 and earlier, when reading from a Parquet data source 
table, Spark always returns null for any column whose column names in Hive 
metastore schema and Parquet schema are in different letter cases, no matter 
whether `spark.sql.caseSensitive` is set to true or false. Since 2.4, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched. This change also applies to Parquet Hive tables when 
`spark.sql.hive.convertMetastoreParquet` is set to true.
--- End diff --

Hi, @seancxmao . Maybe, the followings?
```
- `spark.sql.caseSensitive` is set to true or false
+ `spark.sql.caseSensitive` is set to `true` or `false`
```
```
- `spark.sql.caseSensitive` is set to false
+ `spark.sql.caseSensitive` is set to `false`
```
```
- `spark.sql.hive.convertMetastoreParquet` is set to true
+ `spark.sql.hive.convertMetastoreParquet` is set to `true`
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99811/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)**
 for PR 22683 at commit 
[`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
ping @dbtsai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239707130
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

Ya. That's the same what I observed. There is no need to keep both test 
cases because there is no code path like that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239706529
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 ---
@@ -16,10 +16,13 @@
  */
 package org.apache.spark.deploy.k8s.features
 
-import scala.collection.JavaConverters._
+import java.io.File
+import java.nio.charset.StandardCharsets
+import java.nio.file.Files
 
 import io.fabric8.kubernetes.api.model._
 import org.scalatest.BeforeAndAfter
+import scala.collection.JavaConverters._
--- End diff --

? Hi, @mccheah . We import `java.*` and `scala.*` before any others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23218
  
do we need to relnote jvm compatibility?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239705869
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 intercept[IllegalArgumentException] {
   mgr.getSecretKey()
 }
+  case FILE =>
+val secretFile = createTempSecretFile()
+conf.set(AUTH_SECRET_FILE, secretFile.getAbsolutePath)
+mgr.initializeAuth()
+assert(encodeFileAsBase64(secretFile) === 
mgr.getSecretKey())
 }
   }
 }
   )
 }
   }
 
+  private def encodeFileAsBase64(secretFile: File) = {
+Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
+  }
+
+  private def createTempSecretFile(contents: String = "test-secret"): File 
= {
+val secretDir = Utils.createTempDir("temp-secrets")
+val secretFile = new File(secretDir, "temp-secret.txt")
+Files.write(secretFile.toPath, 
contents.getBytes(StandardCharsets.UTF_8))
+secretFile
--- End diff --

can this secret be recovered on disk or we trust tempDir ACL is sufficient?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)**
 for PR 22683 at commit 
[`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread wangjiaochun
Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239704796
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

I mean that before add code "if (sortedRecords.hasNext()) { return }" it 
will fail. now add assertEquals(0, spillFilesCreated.size()) to 
writeEmptyIterator seems good. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99810/testReport)**
 for PR 22683 at commit 
[`6a3c58b`](https://github.com/apache/spark/commit/6a3c58b119ed298e1cab8d9a9b341a667a86c8f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)**
 for PR 23245 at commit 
[`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5844/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23245
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99803/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239702595
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

Do you mean that `writeEmptyIterator` will fail if we add this line 
`assertEquals(0, spillFilesCreated.size());` there?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239702559
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

I filed https://issues.apache.org/jira/browse/SPARK-26301 to suggest the 
alternative scheme. Unlike SPARK-26139 this would change the functionality that 
was merged here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99803/testReport)**
 for PR 23245 at commit 
[`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ArrowCollectSerializer(Serializer):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...

2018-12-06 Thread liu-zhaokun
Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/23104
  
@guoxiaolongzte  good job


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23252
  
**[Test build #99808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99808/testReport)**
 for PR 23252 at commit 
[`957cb15`](https://github.com/apache/spark/commit/957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5843/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23252
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239701916
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -968,6 +970,17 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ Power Iteration Clustering
+
+Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. 
`spark.assignClusters` method runs the PIC algorithm and returns a cluster 
assignment for each input vertex.
+
+```{r}
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
--- End diff --

BTW, when I added that into https://spark.apache.org/contributing.html, we 
also agreed upon following committer's judgement based upon the guide because 
the guide mentions:

> The coding conventions described above should be followed, unless there 
is good reason to do otherwise. Exceptions include legacy code and modifying 
third-party code.

since we do have legacy reason, and there is a good reason - consistency 
and committer's judgement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r239701821
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -367,11 +372,26 @@ private[spark] class SecurityManager(
 
   case _ =>
 require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
-  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config")
 return
 }
 
-secretKey = Utils.createSecret(sparkConf)
+if (sparkConf.get(AUTH_SECRET_FILE_DRIVER).isDefined
--- End diff --

Also considered adding a validation that file-based auth can only be used 
for Kubernetes. Omitted it for simplicity, but file-based auth seems dubious 
for non-K8s contexts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-06 Thread mccheah
GitHub user mccheah opened a pull request:

https://github.com/apache/spark/pull/23252

[SPARK-26239] File-based secret key loading for SASL.

## What changes were proposed in this pull request?

This proposes an alternative way to load secret keys into a Spark 
application that is running on Kubernetes. Instead of automatically generating 
the secret, the secret key can reside in a file that is shared between both the 
driver and executor containers.

## How was this patch tested?

Unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/palantir/spark auth-secret-with-file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23252


commit 957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9
Author: mcheah 
Date:   2018-12-07T05:04:56Z

[SPARK-26239] File-based secret key loading for SASL.

This proposes an alternative way to load secret keys into a Spark 
application that is running on Kubernetes. Instead of automatically generating 
the secret, the secret key can reside in a file that is shared between both the 
driver and executor containers.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-06 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/23252
  
@vanzin @vinooganesh @ifilonenko 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239701364
  
--- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R ---
@@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", {
   expect_equal(length(local.posterior), sum(unlist(local.posterior)))
 })
 
+test_that("spark.assignClusters", {
+  df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+ list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+ list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+  clusters <- spark.assignClusters(df, initMode = "degree", weightCol = 
"weight")
+  expected_result <- createDataFrame(list(list(4L, 1L),
+  list(0L, 0L),
+  list(1L, 0L),
+  list(3L, 1L),
+  list(2L, 0L)),
+  schema = c("id", "cluster"))
--- End diff --

ditto for style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99807/testReport)**
 for PR 22683 at commit 
[`87a9d5a`](https://github.com/apache/spark/commit/87a9d5ad1ebfbb9b247e95ead3e1a4c34ee08020).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239701069
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -968,6 +970,17 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ Power Iteration Clustering
+
+Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. 
`spark.assignClusters` method runs the PIC algorithm and returns a cluster 
assignment for each input vertex.
+
+```{r}
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
--- End diff --

There are two separate style are already mixed in R code IIRC:

```r
df <- createDataFrame(
  list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
  list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
```

or

```r
df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
   list(4L, 0L, 0.1)),
  schema = c("src", "dst", "weight"))
```

Let's avoid mixed style, and let's go for the later one when possible 
because at least that looks more complying the code style.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread wangjiaochun
Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239700999
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

Test writeEmptyIterator() has create spill file although write  empty 
iterator,but the writeEmptyIteratorNotCreateEmptySpillFile test not create 
spill file while there has not records in memroy


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239700846
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -968,6 +970,17 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ Power Iteration Clustering
+
+Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. 
`spark.assignClusters` method runs the PIC algorithm and returns a cluster 
assignment for each input vertex.
+
+```{r}
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
--- End diff --

Yea, we do have for indentation rule. "Code Style Guide" at 
https://spark.apache.org/contributing.html -> 
https://google.github.io/styleguide/Rguide.xml. I know the code style is not 
perfectly documented but at least there are some examples. I think the correct 
indentation is:

```r
df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
   list(4L, 0L, 0.1)),
  schema = c("src", "dst", "weight"))
``` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99806/testReport)**
 for PR 22683 at commit 
[`e9b4b2c`](https://github.com/apache/spark/commit/e9b4b2c5b3f0da4e77d60d8299481c70aec95b6a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239700056
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -968,6 +970,17 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ Power Iteration Clustering
+
+Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. 
`spark.assignClusters` method runs the PIC algorithm and returns a cluster 
assignment for each input vertex.
+
+```{r}
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
--- End diff --

Do we have an indentation rule for this? This PR is using two types of 
indentations for the same statements.
- For docs (sparkr-vignettes.Rmd, mllib_clustering.R), this line is aligned 
with the first `list`.
- For real code (test_mllib_clustering.R, powerIterationClustering.R), this 
line is aligned with the second `list`.

Can we use the same indentation rule?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/23207
  
```
the code looks much cleaner now!
```
Sorry for the original rush and code, I should and will pay more attention 
on coding clean and more discussion on optional implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99805/testReport)**
 for PR 23207 at commit 
[`6378a3d`](https://github.com/apache/spark/commit/6378a3d4707b0d7559fca20220229cde71f9a64b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5842/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239698500
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -78,6 +80,7 @@ object SQLMetrics {
   private val SUM_METRIC = "sum"
   private val SIZE_METRIC = "size"
   private val TIMING_METRIC = "timing"
+  private val NS_TIMING_METRIC = "nanosecond"
--- End diff --

How about naming it as `NORMALIZE_TIMING_METRIC`, maybe it can be reused 
later for other timing metric which need normalize unit. If you think its 
strange name I'll change back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239698273
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 ---
@@ -333,8 +343,19 @@ object ShuffleExchangeExec {
   new ShuffleDependency[Int, InternalRow, InternalRow](
 rddWithPartitionIds,
 new PartitionIdPassthrough(part.numPartitions),
-serializer)
+serializer,
+shuffleWriterProcessor = createShuffleWriteProcessor(writeMetrics))
 
 dependency
   }
+
+  /**
+   * Create a customized [[ShuffleWriteProcessor]] for SQL which wrap the 
default metrics reporter
+   * with [[SQLShuffleWriteMetricsReporter]] as new reporter for 
[[ShuffleWriteProcessor]].
+   */
+  def createShuffleWriteProcessor(metrics: Map[String, SQLMetric]): 
ShuffleWriteProcessor = {
+(reporter: ShuffleWriteMetricsReporter) => {
--- End diff --

Yes it can't work with Scala 2.11, should write in more readable, done in 
6378a3d.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239698174
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/ShuffleWriterProcessor.scala ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, 
TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.scheduler.MapStatus
+
+/**
+ * The interface for customizing shuffle write process. The driver create 
a ShuffleWriteProcessor
+ * and put it into [[ShuffleDependency]], and executors use it in each 
ShuffleMapTask.
+ */
+private[spark] trait ShuffleWriteProcessor extends Serializable with 
Logging {
+
+  /**
+   * Create a [[ShuffleWriteMetricsReporter]] from the default reporter, 
always return a proxy
+   * reporter for both local accumulator and original reporter updating. 
As the reporter is a
+   * per-row operator, here need a careful consideration on performance.
+   */
+  def createMetricsReporter(reporter: ShuffleWriteMetricsReporter): 
ShuffleWriteMetricsReporter
--- End diff --

Copy, the trait can be added when we need more customization for SQL 
shuffle. Done in 6378a3d.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239697971
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
+final UnsafeShuffleWriter writer = createWriter(true);
+writer.write(Iterators.emptyIterator());
+final Option mapStatus = writer.stop(true);
+assertTrue(mapStatus.isDefined());
+assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

It seems that we can move this line to `writeEmptyIterator()` instead of 
making a new test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239697860
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
 }
   }
 
+  @Test
+  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception 
{
--- End diff --

This looks like mostly a duplication of `writeEmptyIterator`. Logically, do 
we need to keep both `writeEmptyIterator` and 
`writeEmptyIteratorNotCreateEmptySpillFile` seperately?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99798/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23221
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23221
  
**[Test build #99798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99798/testReport)**
 for PR 23221 at commit 
[`c9ab9bc`](https://github.com/apache/spark/commit/c9ab9bcc378168ff3430d8885899ccd74afe7b32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11

2018-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23250


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23250
  
Thank you, @HyukjinKwon . Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)**
 for PR 23108 at commit 
[`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5841/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-06 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/23108
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23174: [SPARK-26194][k8s] Auto generate auth secret for ...

2018-12-06 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/23174#discussion_r239694862
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ---
@@ -87,44 +88,61 @@ private[spark] class 
BasicExecutorFeatureStep(kubernetesConf: KubernetesExecutor
 val executorCpuQuantity = new QuantityBuilder(false)
   .withAmount(executorCoresRequest)
   .build()
-val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
-  new EnvVarBuilder()
-.withName(ENV_CLASSPATH)
-.withValue(cp)
-.build()
-}
-val executorExtraJavaOptionsEnv = kubernetesConf
-  .get(EXECUTOR_JAVA_OPTIONS)
-  .map { opts =>
-val subsOpts = Utils.substituteAppNExecIds(opts, 
kubernetesConf.appId,
-  kubernetesConf.executorId)
-val delimitedOpts = Utils.splitCommandString(subsOpts)
-delimitedOpts.zipWithIndex.map {
-  case (opt, index) =>
-new 
EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
+
+val executorEnv: Seq[EnvVar] = {
+(Seq(
+  (ENV_DRIVER_URL, driverUrl),
+  (ENV_EXECUTOR_CORES, executorCores.toString),
+  (ENV_EXECUTOR_MEMORY, executorMemoryString),
+  (ENV_APPLICATION_ID, kubernetesConf.appId),
+  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
+  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
+  (ENV_EXECUTOR_ID, kubernetesConf.executorId)
+) ++ kubernetesConf.environment).map { case (k, v) =>
+  new EnvVarBuilder()
+.withName(k)
+.withValue(v)
+.build()
 }
-  }.getOrElse(Seq.empty[EnvVar])
-val executorEnv = (Seq(
-  (ENV_DRIVER_URL, driverUrl),
-  (ENV_EXECUTOR_CORES, executorCores.toString),
-  (ENV_EXECUTOR_MEMORY, executorMemoryString),
-  (ENV_APPLICATION_ID, kubernetesConf.appId),
-  // This is to set the SPARK_CONF_DIR to be /opt/spark/conf
-  (ENV_SPARK_CONF_DIR, SPARK_CONF_DIR_INTERNAL),
-  (ENV_EXECUTOR_ID, kubernetesConf.executorId)) ++
-  kubernetesConf.environment)
-  .map(env => new EnvVarBuilder()
-.withName(env._1)
-.withValue(env._2)
-.build()
-  ) ++ Seq(
-  new EnvVarBuilder()
-.withName(ENV_EXECUTOR_POD_IP)
-.withValueFrom(new EnvVarSourceBuilder()
-  .withNewFieldRef("v1", "status.podIP")
+  } ++ {
+Seq(new EnvVarBuilder()
+  .withName(ENV_EXECUTOR_POD_IP)
+  .withValueFrom(new EnvVarSourceBuilder()
+.withNewFieldRef("v1", "status.podIP")
+.build())
   .build())
-.build()
-) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq
+  } ++ {
+Option(secMgr.getSecretKey()).map { authSecret =>
+  new EnvVarBuilder()
+.withName(SecurityManager.ENV_AUTH_SECRET)
+.withValue(authSecret)
--- End diff --

Ah I thought about this a bit more and realized that this is more insecure 
than I originally read it to be.

If the secret is put directly in the environment variable field itself, 
then anyone who has permission to get the pod metadata from the Kubernetes API 
server can now read the secret generated by this application. In practice 
permissioning on pod specs is often far looser than permissioning on Kubernetes 
secret objects. In this solution the administrator has to restrict access to 
pod specs to only the user.

I think at the very least we want this to be configured via creating a 
Kubernetes secret object, then [loading the environment 
variable](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables)
 to point to the secret object.

In the meantime I'm going to push the PR that allows secrets to be 
specified as file paths directly. I will also file a Spark ticket to avoid 
putting the environment variable directly in the pod spec object itself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239694008
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -243,10 +248,19 @@ case class HashPartitioning(expressions: 
Seq[Expression], numPartitions: Int)
  * Represents a partitioning where rows are split across partitions based 
on some total ordering of
  * the expressions specified in `ordering`.  When data is partitioned in 
this manner the following
  * two conditions are guaranteed to hold:
- *  - All row where the expressions in `ordering` evaluate to the same 
values will be in the same
--- End diff --

nit: "row" -> "rows", "where... `ordering`" -> "whose `ordering` 
expressions"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239693849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -243,10 +248,19 @@ case class HashPartitioning(expressions: 
Seq[Expression], numPartitions: Int)
  * Represents a partitioning where rows are split across partitions based 
on some total ordering of
  * the expressions specified in `ordering`.  When data is partitioned in 
this manner the following
--- End diff --

nit: add "," after "this manner".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23239
  
Yes it is. `UnsafeProjection` always normalize NaN and -0.0, and Spark uses 
`UnsafeProjection` to produce output. So users can't distinguish them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5840/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/23215
  
Thank you @cloud-fan @viirya @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99803/testReport)**
 for PR 23245 at commit 
[`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/23239
  
The migration guide has changed by another followup 
https://github.com/apache/spark/pull/23141:

> In Spark version 2.4 and earlier, float/double -0.0 is semantically equal 
to 0.0, but users can still distinguish them via `Dataset.show`, 
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 
internally, and users can't distinguish them any more.

Is above still correct after this change?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99797/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23245
  
**[Test build #99797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99797/testReport)**
 for PR 23245 at commit 
[`15000a2`](https://github.com/apache/spark/commit/15000a2d548bf3785ba61aa01b2f6fb60e47ca71).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239690226
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, 
IntegerType}
 
 /**
  * Specifies how tuples that share common expressions will be distributed 
when a query is executed
- * in parallel on many machines.  Distribution can be used to refer to two 
distinct physical
- * properties:
- *  - Inter-node partitioning of data: In this case the distribution 
describes how tuples are
- *partitioned across physical machines in a cluster.  Knowing this 
property allows some
- *operators (e.g., Aggregate) to perform partition local operations 
instead of global ones.
- *  - Intra-partition ordering of data: In this case the distribution 
describes guarantees made
- *about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ *   The distribution describes how tuples are partitioned across physical 
machines in a cluster.
+ *   Knowing this property allows some operators (e.g., Aggregate) to 
perform partition local
+ *   operations instead of global ones.
  */
--- End diff --

for ordering, I think people can look at `OrderedDistribution`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23239#discussion_r239690045
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -198,11 +198,45 @@ protected final void writeLong(long offset, long 
value) {
 Platform.putLong(getBuffer(), offset, value);
   }
 
+  // We need to take care of NaN and -0.0 in several places:
+  //   1. When compare values, different NaNs should be treated as same, 
`-0.0` and `0.0` should be
+  //  treated as same.
+  //   2. In range partitioner, different NaNs should belong to the same 
partition, -0.0 and 0.0
--- End diff --

As this is not a problem, we should update the PR description too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239689874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, 
IntegerType}
 
 /**
  * Specifies how tuples that share common expressions will be distributed 
when a query is executed
- * in parallel on many machines.  Distribution can be used to refer to two 
distinct physical
- * properties:
- *  - Inter-node partitioning of data: In this case the distribution 
describes how tuples are
- *partitioned across physical machines in a cluster.  Knowing this 
property allows some
- *operators (e.g., Aggregate) to perform partition local operations 
instead of global ones.
- *  - Intra-partition ordering of data: In this case the distribution 
describes guarantees made
- *about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ *   The distribution describes how tuples are partitioned across physical 
machines in a cluster.
+ *   Knowing this property allows some operators (e.g., Aggregate) to 
perform partition local
+ *   operations instead of global ones.
  */
--- End diff --

Yes, I understand that partitioning has nothing to do with intra-partition 
ordering at all. And it was wrong to include intra-partition ordering as part 
of the distribution properties. But I was thinking mentioning ordering as a 
side note would probably help ppl understand better how some operators work. Or 
maybe here's not the best place to put it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23201#discussion_r239687264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
 decimalTry.getOrElse(StringType)
-  case VALUE_STRING => StringType
+  case VALUE_STRING =>
+val stringValue = parser.getText
--- End diff --

or the order doesn't matter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23201#discussion_r239687213
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
 decimalTry.getOrElse(StringType)
-  case VALUE_STRING => StringType
+  case VALUE_STRING =>
+val stringValue = parser.getText
--- End diff --

I checked `PartitioningUtils.inferPartitionColumnValue`, we try timestamp 
first and then date. Shall we follow it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5839/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...

2018-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23251
  
**[Test build #99802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)**
 for PR 23251 at commit 
[`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23251: [SPARK-26300][SS] The `checkForStreaming` mothod ...

2018-12-06 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/23251

[SPARK-26300][SS] The `checkForStreaming`  mothod  may be called twice in 
`createQuery`

## What changes were proposed in this pull request?
If `checkForContinuous`  is called ( `checkForStreaming` is called in 
`checkForContinuous`  ), the `checkForStreaming`  mothod  will be called twice 
in `createQuery` , this is not necessary,  and the `checkForStreaming` method 
has a lot of statements,  so it's better to remove one of them.

## How was this patch tested?

Existing unit tests in `StreamingQueryManagerSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark 
isUnsupportedOperationCheckEnabled

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23251


commit b1e71ee7a723d63f1cf3c0754f2372eb185439d3
Author: liuxian 
Date:   2018-12-07T03:08:26Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >