[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Thanks for reviewing and merging @cloud-fan, @gatorsmile, @felixcheung! --- - To unsubscribe, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21118 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93500/ Test PASSed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #93500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93500/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #93500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93500/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 LGTM, let's merge it when the tests pass (the last pass was 4 days ago) --- - To unsubscribe, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, any update on merging this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-22 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21118 so where are we on this? looks like we have 2 LGTM? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93365/ Test PASSed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #93365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93365/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #93365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93365/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Rebased on master to fix conflicts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, I'd like to get this PR in by 2.4.0. Now that the change to push predicates and projections happens when converting to the physical plan, this can go in. I've rebased this on master and

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92868/ Test PASSed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #92868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92868/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/854/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #92868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92868/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90507/ Test PASSed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90507/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3137/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90507/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21118 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90494/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90494/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3127/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90494/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21118 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21118 LGTM, although as you mentioned I think it'd definitely be valuable to follow up and understand why some operators insist on UnsafeRow even though this isn't what SparkPlan declares as the row

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90481/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90481/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3116/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3115/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90481/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, @jose-torres, I think this is ready for final review. I've rebased on top of the rename to `InputPartition`. I've also added a projection when this produces a physical plan to

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90480/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90480/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90480/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3113/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90478/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90478/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #90478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90478/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 > If we want to go this way, I think we should fully bring back #10511 to make this contract explicitly, i.e. which operator produce unsafe row and which operator only accepts unsafe row as input.

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 > If we did that, then only ProjectExec and FilterExec would need to support InternalRow, which they already do. This partially brings #10511 back, and we need to plan project and filter

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan: This PR is also related to #21262 because that PR updates the conversion from logical to physical plan and handles projections and filtering. We could modify that strategy to always

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 > We expect data source to produce `ColumnarBatch` for better performance, and the row interface performance is not that important. I disagree. The vectorized path isn't used for all Parquet

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 > Actually the `SupportsScanUnsafeRow` is only there to avoid perf regression for migrating file sources. If you think that's not a good public API, we can move it to internal package and only use

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 @rdblue , this is a good point. Since not all the operators need unsasfe row, we can save the copy at data source side if we don't need to produce unsade row. Actually we had such a mechanism

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 I just did a performance test based on our 2.1.1 and a real table. I tested a full scan of an hour of data with a single data filter. The scan had 13,083 tasks and read 1084.8 GB. I used

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, let me clarify what I'm getting at here. It appears that Spark makes at least one copy of data to unsafe when reading any Parquet row. If the projection includes partition

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 all the places that use `GenerateUnsafeRowJoiner` assume the input row is unsafe row. `ShuffleExchangeExec` assumes its input is unsafe row, because its serializer is

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, actually I tried a lot of different queries yesterday, including joins and aggregations. The only thing that didn't work was `collect` for a `select * from t` because `SparkPlan` assumes

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21118 parquet scan doesn't need unsafe row because it outputs `ColumnarBatch`. Note that, `UnsafeRow` is the data format Spark uses to exchange data between operators, but whole-stage-codegen can merge

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan and @jose-torres: I looked at `explain codegen` for reading from a Parquet table (with vectorized reads disabled) and it doesn't look like there is a dependency on `UnsafeRow`:

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89863/ Test PASSed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89863/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2682/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89863/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21118 @rdblue . Could you fix the remaining `KafkaMicroBatchSourceSuite.scala`, too? ```scala [error]

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89665/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89665/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2552/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89665/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Yeah, we should probably add a projection. It's probably only working because the InternalRows that are produced are all UnsafeRow. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21118 Generally looks good. IIRC, there's some arcane reason why plan nodes need to produce UnsafeRow even though SparkPlan.execute() declares InternalRow. So we may need to add a projection

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89663/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89663/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89662/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89662/ Test FAILed. ---

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2551/

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89663/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @jose-torres, @cloud-fan, can you take a look at this? It updates the v2 API to use InternalRow by default. --- - To

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89662/testReport)** for PR 21118 at commit

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2550/

  1   2   >