[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-29 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/23052 it is pretty common for us to write empty dataframe to parquet and later read it back in same for writing to csv with header and reading it back in (with type inference disabled, we assume

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23052 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99407/ Test PASSed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99407/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99407/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99380/ Test PASSed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99380/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99380/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23052 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99372/ Test FAILed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99372/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99372/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23052 Actually it needs similar changes like in https://github.com/apache/spark/pull/23130 --- - To unsubscribe, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23052 > seems like a real failure I am looking at it. It seems the test is not deterministic. --- - To unsubscribe, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23052 seems like a real failure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99361/ Test FAILed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99361/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99361/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23052 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99354/ Test FAILed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99354/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99354/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23052 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99221/ Test FAILed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99221/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99221/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 There are two more things to deal with: https://github.com/apache/spark/pull/23052#issuecomment-440687200 comment will still be valid - at least it should be double checked because

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23052 First of all, sometimes we do need to write "empty" files, so that we can infer schema of a parquet directory. Empty parquet file is not really empty, as it has header/footer.

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 cc @cloud-fan as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 Also, it's not always for Parquet to write empty files. That does not write empty files when data frames are created from emptyRDD (the one pointed out in the PR link I gave). We should match

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 @MaxGekk I didn't mean to block this PR. Since we're going ahead for 3.0, it should be good to match and fix the behaviours across data sources. For instance, CSV should still be able to read

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23052 I have read the tickets you pointed out but haven't found what could potentially block the changes. One of corner cases is saving an empty dataframe. In this case, no files would be written, but

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99107/ Test PASSed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99107/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #99107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99107/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 I think now it should be good timing to match the behaviours. --- - To unsubscribe, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 related another try https://github.com/apache/spark/pull/13252 --- - To unsubscribe, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 One try to add some tests for reading/writing empty dataframes was here https://github.com/apache/spark/pull/13253 fyi ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 Which should be ... this https://github.com/apache/spark/pull/12855 --- - To unsubscribe, e-mail:

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23052 > Similar changes were proposed in Parquet few years ago (by me) and reverted. What was the main reason to revert it? If it is possible could you give me a link to your PR. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 @MaxGekk, actually this is kind of important behaviour change. This basically means we're unable to read the empty files back. Similar changes were proposed in Parquet few years ago (by me) and

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98887/ Test PASSed. ---

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #98887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98887/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23052 **[Test build #98887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98887/testReport)** for PR 23052 at commit

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23052 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional