[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-10-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #97865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97865/testReport)**
 for PR 2 at commit 
[`fdc1efc`](https://github.com/apache/spark/commit/fdc1efcdefe4b9bf002ce43ed1dfd7ab258218ca).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-10-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #97828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97828/testReport)**
 for PR 2 at commit 
[`fdc1efc`](https://github.com/apache/spark/commit/fdc1efcdefe4b9bf002ce43ed1dfd7ab258218ca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-10-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #97845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97845/testReport)**
 for PR 2 at commit 
[`fdc1efc`](https://github.com/apache/spark/commit/fdc1efcdefe4b9bf002ce43ed1dfd7ab258218ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-30 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/2
  
@cloud-fan @rdblue 
I want to leave some comments and thoughts during looking into this again, 
hope these can help us deciding the next step plan.
Currently all the plan assumed input row is `RDD[InternalRow]`, whole 
framework treat columnar read as special case. Also the `inputRDDs` function 
not only be called in `WholeStageCodegenExec`, but also all the father physical 
node, it's very easy to get a mess in the scenario of nested plan during debug 
this fix. So we may have these 3 choices, the first two can totally remove cast 
but maybe have many changes on `CodegenSupport`, the last one can limited the 
changes but still has cast problem:
1. Erasure the type of `inputRDDs`, because we should allow both 
RDD[InternalRow] and RDD[ColumnarBatch] passed, mainly for the parent physical 
plan call the child. This is implemented as the last commit in this PR: 
https://github.com/apache/spark/pull/2/files
2. Refactor the framework to let all plan dealing with columnar batch
3. Limited the changes in `ColumnarBatchScan`, don't change 
`CodegenSupport`, but still left the cast problem. This is implemented as the 
first two commit in this PR: 
https://github.com/apache/spark/pull/2/files/7e88599dfc2caf177d12e890d588be68bdd3bc8e

If all of these are not make sense, I'll just close this. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/2
  
Got it, I'll revert the changes in file source in this commit, thanks for 
your reply.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/2
  
can we do it for data source v2 first? It seems hard to fix the file 
source, as its reader function may lie about the return type.

Let's see what's the simplest fix to remove the hack for data source v2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95422/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95422/testReport)**
 for PR 2 at commit 
[`fdc1efc`](https://github.com/apache/spark/commit/fdc1efcdefe4b9bf002ce43ed1dfd7ab258218ca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95422/testReport)**
 for PR 2 at commit 
[`fdc1efc`](https://github.com/apache/spark/commit/fdc1efcdefe4b9bf002ce43ed1dfd7ab258218ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2674/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-29 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/2
  
@cloud-fan Thanks for your reply Wenchen, I'm trying to achieve this in 
this commit, please take a look, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/2
  
+1 on @rdblue 's idea. One point is, we should use 
`ColumnarBatchScan.supportsBatch` to indicate columnar scan or not, instead of 
asking the RDD to report it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-27 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/2
  
@xuanyuanking, while this does remove the hack, it doesn't address the 
underlying problem. The problem is that there is a single RDD, which may 
contain InternalRow or may contain ColumnarBatch. Generated code knows how to 
differentiate between the two and use the RDD contents correctly.

While this is an improvement because it uses the actual type of records in 
the RDD, the work that needs to be done is to update the columnar case so that 
it does return an `RDD[InternalRow]` for anyone that accesses data using that 
RDD, and then update the generated code to detect a data source RDD and access 
the underlying `RDD[ColumnarBatch]`.

Here's some pseudo-code to demonstrate what I mean. The current code does 
something like this with a cast. Your change wouldn't fix the need to cast to 
`RDD[ColumnarBatch]`:
```scala
def doExecute(rdd: DataSourceRDD[InternalRow]) { // with your change, 
DataSourceRDD[_]
  if (rdd.isColumnar) {
doExecuteColumnarBatch(rdd.asInstanceOf[RDD[ColumnarBatch]])
  } else {
doExecuteRows(rdd)
  }
}
```

I think that should be changed to something like this which is type safe:
```scala
def doExecute(rdd: DataSourceRDD[InternalRow]) {
  if (rdd.isColumnar) {
doExecuteColumnarBatch(rdd.getColumnBatchRDD)
  } else {
doExecuteRows(rdd)
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95261/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95261/testReport)**
 for PR 2 at commit 
[`7e88599`](https://github.com/apache/spark/commit/7e88599dfc2caf177d12e890d588be68bdd3bc8e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95261/testReport)**
 for PR 2 at commit 
[`7e88599`](https://github.com/apache/spark/commit/7e88599dfc2caf177d12e890d588be68bdd3bc8e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2556/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/2
  
cc @cloud-fan and @rdblue have a look when you have time. If this PR 
doesn't coincide with your expect, I'll close this soon. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95217/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95217/testReport)**
 for PR 2 at commit 
[`992a08b`](https://github.com/apache/spark/commit/992a08b1d77d59daeac95c67d07e5b8efe20ce20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class DataSourceRDD[T: ClassTag](`
  * `class DataSourceRowRDD(`
  * `class DataSourceColumnarBatchRDD(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #95217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95217/testReport)**
 for PR 2 at commit 
[`992a08b`](https://github.com/apache/spark/commit/992a08b1d77d59daeac95c67d07e5b8efe20ce20).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org