[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Today, I've re-considered this PR from the bottom and from the beginning. To sum up, I realized that I've been wasting committer's review time. Especially, sorry for your effort,

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81365/ Test PASSed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81365/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Hi, @gatorsmile . I believe I understand your advice correctly at this time. Could you take a look at this `data source` verification PR? --- If your project is set up for it, you can

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81365/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81318/ Test PASSed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81318/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81318/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Hi, @gatorsmile . I make it to use more higher level in `test("orc - predicate push down"`. How do you think about this approach? --- If your project is set up for it, you can reply

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81298/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81298/ Test PASSed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Previous parquet link is broken. The official one is https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81298/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 For Parquet, I can find [TestInputOutputFormat.java](https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/test/java/parquet/hadoop/example/TestInputOutputFormat.java). Parquet

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 I mean, how about Parquet and the others? Do they have the e2e test cases in their projects? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 If you agree, I will try to write more code here as POC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Parquet is the same. We can use `unhandledFilters` for PPD. I think that the others text-based data sources(TEXT/CSV/JSON) doesn't support PPD. --- If your project is set up for it, you

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 How about Parquet and the others? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 In ORC GitHub, up to my knowledge, this is the highest level. ``` val reader = new OrcInputFormat[OrcStruct]().createRecordReader(split, attemptContext) ... reader.nextKeyValue()

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 The ORC test is very specific to the impl. No end-to-end test cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 BTW, from the text-based format, I thought we can include JSON at the beginning. But, unfortunatly, I found that currently Spark use BIGINT for all numeric, DOUBLE for float/double, and

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 For predicate push-down, I ported ORC code already in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 For ORC, I was able to find some random-generated float/double tests, but there is no value-limit test. For Parquet, I'm not sure. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 Does ORC/Parquet have the related test cases? Could we just port them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81157/ Test PASSed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81157/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81157/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81155/ Test FAILed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81155/testReport)** for PR 19060 at commit

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81155/testReport)** for PR 19060 at commit