Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
Today, I've re-considered this PR from the bottom and from the beginning.
To sum up, I realized that I've been wasting committer's review time.
Especially, sorry for your effort,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81365/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81365 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81365/testReport)**
for PR 19060 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
Hi, @gatorsmile .
I believe I understand your advice correctly at this time.
Could you take a look at this `data source` verification PR?
---
If your project is set up for it, you can
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81365 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81365/testReport)**
for PR 19060 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81318/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81318 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81318/testReport)**
for PR 19060 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81318 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81318/testReport)**
for PR 19060 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
Hi, @gatorsmile .
I make it to use more higher level in `test("orc - predicate push down"`.
How do you think about this approach?
---
If your project is set up for it, you can reply
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81298 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81298/testReport)**
for PR 19060 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81298/
Test PASSed.
---
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
Previous parquet link is broken. The official one is
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81298 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81298/testReport)**
for PR 19060 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
For Parquet, I can find
[TestInputOutputFormat.java](https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/test/java/parquet/hadoop/example/TestInputOutputFormat.java).
Parquet
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19060
I mean, how about Parquet and the others? Do they have the e2e test cases
in their projects?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
If you agree, I will try to write more code here as POC.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
Parquet is the same. We can use `unhandledFilters` for PPD.
I think that the others text-based data sources(TEXT/CSV/JSON) doesn't
support PPD.
---
If your project is set up for it, you
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19060
How about Parquet and the others?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
In ORC GitHub, up to my knowledge, this is the highest level.
```
val reader = new OrcInputFormat[OrcStruct]().createRecordReader(split,
attemptContext)
... reader.nextKeyValue()
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19060
The ORC test is very specific to the impl. No end-to-end test cases?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
BTW, from the text-based format, I thought we can include JSON at the
beginning. But, unfortunatly, I found that currently Spark use BIGINT for all
numeric, DOUBLE for float/double, and
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
For predicate push-down, I ported ORC code already in this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19060
For ORC, I was able to find some random-generated float/double tests, but
there is no value-limit test. For Parquet, I'm not sure.
---
If your project is set up for it, you can reply to this
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19060
Does ORC/Parquet have the related test cases? Could we just port them?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81157/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81157 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81157/testReport)**
for PR 19060 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81157 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81157/testReport)**
for PR 19060 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81155/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19060
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81155 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81155/testReport)**
for PR 19060 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19060
**[Test build #81155 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81155/testReport)**
for PR 19060 at commit
35 matches
Mail list logo