[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
LGTM

Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89967/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89967 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89967/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89967/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89964/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89964/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89964/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89879/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89796/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89796/testReport)**
 for PR 20959 at commit 
[`b2c552c`](https://github.com/apache/spark/commit/b2c552c3aef2c6361e669af107330f721398a0bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CSVSuite extends QueryTest with SharedSQLContext with 
SQLTestUtils  with TestCsvData `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89796/testReport)**
 for PR 20959 at commit 
[`b2c552c`](https://github.com/apache/spark/commit/b2c552c3aef2c6361e669af107330f721398a0bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89677/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)**
 for PR 20959 at commit 
[`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)**
 for PR 20959 at commit 
[`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89664/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)**
 for PR 20959 at commit 
[`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)**
 for PR 20959 at commit 
[`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-18 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20959
  
@gatorsmile @HyukjinKwon @sujithjay May I ask you to look at the PR again


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89239/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89239/testReport)**
 for PR 20959 at commit 
[`d12c2e2`](https://github.com/apache/spark/commit/d12c2e221b446596e2322a4d86ee0fd55a09b6ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89241/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89241/testReport)**
 for PR 20959 at commit 
[`a37bf3b`](https://github.com/apache/spark/commit/a37bf3bc4034f587768903df304e5642a68f87c8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89241/testReport)**
 for PR 20959 at commit 
[`a37bf3b`](https://github.com/apache/spark/commit/a37bf3bc4034f587768903df304e5642a68f87c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89239/testReport)**
 for PR 20959 at commit 
[`d12c2e2`](https://github.com/apache/spark/commit/d12c2e221b446596e2322a4d86ee0fd55a09b6ba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89220/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89220/testReport)**
 for PR 20959 at commit 
[`1427f73`](https://github.com/apache/spark/commit/1427f73e13b5809ac9622bf5fbd80cb178544864).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89220/testReport)**
 for PR 20959 at commit 
[`1427f73`](https://github.com/apache/spark/commit/1427f73e13b5809ac9622bf5fbd80cb178544864).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89217/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89217/testReport)**
 for PR 20959 at commit 
[`9f26bb7`](https://github.com/apache/spark/commit/9f26bb78a5be28eb38c3fd178f4f88b66b8917ff).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89217/testReport)**
 for PR 20959 at commit 
[`9f26bb7`](https://github.com/apache/spark/commit/9f26bb78a5be28eb38c3fd178f4f88b66b8917ff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89208/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89208/testReport)**
 for PR 20959 at commit 
[`3bceb3a`](https://github.com/apache/spark/commit/3bceb3a66064778be2bf568d3ad6382152fe9bd7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89208/testReport)**
 for PR 20959 at commit 
[`3bceb3a`](https://github.com/apache/spark/commit/3bceb3a66064778be2bf568d3ad6382152fe9bd7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89154/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89154/testReport)**
 for PR 20959 at commit 
[`d4e815e`](https://github.com/apache/spark/commit/d4e815e8564e4c81933540896a0d85f4c8225ea2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89155/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89155 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89155/testReport)**
 for PR 20959 at commit 
[`d584cfe`](https://github.com/apache/spark/commit/d584cfe19135c1e742b1ae1f2873b827a32c0d7a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89155/testReport)**
 for PR 20959 at commit 
[`d584cfe`](https://github.com/apache/spark/commit/d584cfe19135c1e742b1ae1f2873b827a32c0d7a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89154/testReport)**
 for PR 20959 at commit 
[`d4e815e`](https://github.com/apache/spark/commit/d4e815e8564e4c81933540896a0d85f4c8225ea2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
Let's address @gatorsmile's and mine at 
https://github.com/apache/spark/pull/20963 too as well. Seems fine otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20959
  
@MaxGekk Thanks for working on this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
sure, will review this and go ahead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/20959
  
I'm good with having this option given the data @MaxGekk posted. (I haven't 
reviewed the code - somebody else should do that before merging).

`val sampledSchema = spark.read.option("inferSchema", 
true).csv(ds.sample(false, 0.7)).schema` is a bit clunky compared with an 
option that applies to all the sources.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
For usability, the workaround I suggested above has more flexibility. For 
example, we can make different operation (e.g, filter) on schema inference 
path. They are only few lines.

Schema inference is discouraged in production line. I believe, for example, 
just taking 100 records and use the schema makes more sense.

I am not against this option but I am saying I don't feel strong on this 
for above reasons. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-03 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20959
  
@rxin I made an experiment on json files but numbers for csv are almost the 
same. For example, inferring schema for 50GB json:
```
scala> spark.read.option("samplingRatio", 0.1).json("test.json")
```
took 1.7 minute
```
scala> spark.read.option("samplingRatio", 1.0).json("test.json")
```
took 21.9 minutes.

I have looked in a profiler where Spark spends time during schema inferring 
for 50GB json. At least on my laptop - 75% in json parsing and 18% on disk IO. 
Of course, the numbers will be different in a cluster if the files would be 
read from s3 via network. In any way, the samplingRatio option gives us 
opportunity to find a balance of CPUs load and network/disk IO. 

@HyukjinKwon The question is not about workaround, it is about usability:

1. For interactive queries, an user doesn't have to write the boilerplate 
code if there is the option.

2. If the code is used inside of a library, developers don't have to check 
special cases like if it is json use the samplingRatio option otherwise do 
sampling manually.

Additionally the behavior behind of the option could be improved in the 
future. For example, it will require less file reads during sampling. It would 
be easer to do that with the option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20959
  
 _if I remember this correctly_, there was a discussion about it a lone ago 
and, @rxin was not sure how much it improves the perf and if it's worth, which 
I ended up with agreeing with. @rxin, did I recall this correctly?

There's a workaround for it BTW:

```scala
val ds = Seq("a", "b", "c", "d").toDS
val sampledSchema = spark.read.option("inferSchema", 
true).csv(ds.sample(false, 0.7)).schema
spark.read.schema(sampledSchema).csv(ds)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88827/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88827/testReport)**
 for PR 20959 at commit 
[`b6a7cc8`](https://github.com/apache/spark/commit/b6a7cc852bb7f27a94eb268f4f3e5e35f97538bd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88827/testReport)**
 for PR 20959 at commit 
[`b6a7cc8`](https://github.com/apache/spark/commit/b6a7cc852bb7f27a94eb268f4f3e5e35f97538bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88818/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88818/testReport)**
 for PR 20959 at commit 
[`91c57cf`](https://github.com/apache/spark/commit/91c57cf898d7f62ee3c9acff63607fcb102bd317).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88818/testReport)**
 for PR 20959 at commit 
[`91c57cf`](https://github.com/apache/spark/commit/91c57cf898d7f62ee3c9acff63607fcb102bd317).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88817 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88817/testReport)**
 for PR 20959 at commit 
[`ba12fca`](https://github.com/apache/spark/commit/ba12fca8514a2cb620224e859e8a8b5cc208bf31).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88817/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #88817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88817/testReport)**
 for PR 20959 at commit 
[`ba12fca`](https://github.com/apache/spark/commit/ba12fca8514a2cb620224e859e8a8b5cc208bf31).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20959
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org