[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 @MaxGekk, thanks. mind opening a PR to upgrade? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-07-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 > do we still hit the bug when parsing csv data? I have checked uniVocity 2.7.2, there is no problem on this version. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-07-02 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 oh, super quick fix ;) Thanks, @MaxGekk In the master, do we still hit the bug when parsing csv data? --- - To unsubscribe,

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-30 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 The bug has been already fixed in uniVocity `2.6.5-SNAPSHOT` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-29 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 > please take a following action. Will help and check if it's needed. I have opened the issue for uniVocity parser: https://github.com/uniVocity/univocity-parsers/issues/250 ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 LGTM. @MaxGekk please take a following action. Will help and check if it's needed. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 @HyukjinKwon BTW, can you check this? @MaxGekk Probably, I feel you'd be better to file a new jira for the point you're looking into. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-26 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 > v2.5.9 also have the same behaviour? yes, it is the same. > Anyway, it'd be better to ask the author ;) I asked before and I got quick response. ok. I will create an

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-26 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 v2.5.9 also have the same behaviour? Anyway, it'd be better to ask the author ;) I asked before and I got quick response. --- -

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-26 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 Here is the test for uniVocity parser: https://github.com/MaxGekk/univocity_tests . For the first line, `parseLine` outputs empty array but `null`s for the next calls. What do you think should I

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92317/ Test PASSed. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92317/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92317/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/469/

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 > But first we should be sure if it's a bug or not for this anyway. I will try to reproduce it on small example without Spark. I am not sure what the expected behavior should be if set of

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 I mean `null is returned for valid input string "8"`. I thought this is a bug. If there's valid case returning `null`, yea we should handle `null` of course but the case you mentioned sounds

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 > So you mean it's a bug in Univocity? No, I mean we don't handle `null` from Univocity's `parseLine` at all (in another situations), and we just propagate `NullPointerException` to an user

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 So you mean it's a bug in Univocity? that's another fix for a bug existing in Univocity then. We could work around this bug if it's clear that's a bug. I would suggest to open a bug there if we

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 It seems `null` for Univocity's `parserLine` is normal way to indicate about an error. Should we handle `null`s and throw `BadRecordException` instead of propagating NPE to user's app? ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 @MaxGekk yea, I noticed that behaviour. Probably, in case we set an empty array in `CommonSettings.selectIndexes`, it seems `UnivocityParser` returns null for valid input? I'm not sure setting an

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 I have found the places inside of UnivocityParser from where the `null` comes. It is interesting that `null` is returned for valid input string `"8"`. See the screenshot:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 Wouldn't it be better to check schema instead of value for per record? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21631 Looking at the `NullPointerException`, it comes from the line: ```scala if (tokens.length != schema.length) { ``` where `tokens` is null returned by `parseLine` of `UnivocityParser`.

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92293/ Test PASSed. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92293/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92293/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/459/

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 yea, I checked the two queries with/without column pruning in the master; ``` ./bin/spark-shell --conf spark.sql.csv.parser.columnPruning.enabled=true (default) scala> val dir =

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92282/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92284/ Test FAILed. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92282/ Test FAILed. ---

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92284/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21631 Both? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 yea, I think this is regressions because I checked that the query above passed before[ this

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21631 @maropu Could you confirm whether these two bugs are regressions in the master branch? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 As I described in https://github.com/apache/spark/pull/21625#discussion_r197679077, I found another bug? (the case where `spark.sql.csv.parser.columnPruning.enabled=false`) when working on this pr;

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92284/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/451/

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 @HyukjinKwon sure, I'll do --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21631 **[Test build #92282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92282/testReport)** for PR 21631 at commit

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/449/

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21631 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-24 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 cc: @HyukjinKwon @MaxGekk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: