[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)**
 for PR 20387 at commit 
[`9bb0141`](https://github.com/apache/spark/commit/9bb01416d68e9e2b7ed34745ba0a4b92721d98dd).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87000/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/537/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87000/testReport)**
 for PR 20387 at commit 
[`9bb0141`](https://github.com/apache/spark/commit/9bb01416d68e9e2b7ed34745ba0a4b92721d98dd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@cloud-fan, I'll update this PR and we can talk about passing configuration 
on the dev list.

And as a reminder, please close #20445.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> I tried and can't figure out how to do it with PhysicalOperation, that's 
why I build something new for data source v2 pushdown.

The problem is that we should get DSv2 working independently of a redesign 
of the push-down rules. Throwing an untested push-down rule into changes for 
DSv2 makes the new API less reliable, and hurts people that want to try it out 
and start using it. There is no benefit to doing this for 2.3.0.

I also think a redesign of push-down should be properly designed, thought 
out, and tested. I'm all for fixing this if you can make the case that we need 
to, but we shouldn't needlessly mix together major changes.

@cloud-fan, There's more discussion about this on #20476 that I encourage 
you to read.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20387
  
+1 for @cloud-fan 's suggestion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
Hi @rdblue , I think we all agree that the plan should be immutable, but 
other parts are still under discussion. Can you send a new PR that focus on 
making the plan immutable? so that we can merge that one first, and continue to 
discuss other parts in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
> We can add things like limit pushdown later, by adding it properly to the 
existing code.

I tried and can't figure out how to do it with `PhysicalOperation`, that's 
why I build something new for data source v2 pushdown. I'm OK to reuse it if 
you can convince me `PhysicalOperation` is extendable, e.g. support limit push 
down.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
Currently `DataSourceOptions` is the major way for Spark and users to pass 
information to the data source. It's very flexible and only defines one rule: 
the option key lookup should be case-insensitive.

I agree with your point that more consistency is better. It's annoying if 
every data source needs to define their own option keys for table and database, 
and tell users about it. It's good if Spark can define some rules about what 
option keys should be used for some common information.

My proposal:
```
class DataSourceOptions {
  ...
  
  def getPath(): String = get("path")

  def getTimeZone(): String = get("timeZone")

  def getTableName(): String = get("table")
}
```
We can keep adding these options since this won't break binary 
compatibility.

And then we just need to document it and tell both users and data source 
developers about how to specify and retrieve these common options.

Then I think we don't need to add `table` and `database` parameters to 
`DataSourceV2Relation`, because we can easily do `relation.options.getTable`.

BTW this doesn't change the API so I think it's fine to do it after 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@dongjoon-hyun, @gatorsmile, could you guys weigh in on some this 
discussion? I'd like to get additional perspectives on the changes I'm 
proposing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> Let's keep it general and let the data source to interprete it.

I think this is the wrong approach. The reason why we are using a special 
`DataSourceOptions` object is to ensure that data sources consistently ignore 
case when reading **their own options**. Consistency across data sources 
matters and we should be pushing for more consistency, not less.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@cloud-fan, to your point about push-down order, I'm not saying that order 
doesn't matter at all, I'm saying that the push-down can run more than once and 
it should push the closest operators. That way, if you have a situation where 
operators can't be reordered but they can all be pushed, they all get pushed 
through multiple runs of the rule, each one further refining the relation.

If we do it this way, then we don't need to traverse the logical plan to 
find out what to push down. We continue pushing projections until the plan 
stops changing. This is how the rest of the optimizer works, so I think it is a 
better approach from a design standpoint.

My implementation also reuses more existing code that we have higher 
confidence in, which is a good thing. We can add things like limit pushdown 
later, by adding it properly to the existing code. I don't see a compelling 
reason to toss out the existing implementation, especially without the same 
level of testing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> `spark.read.format("iceberg").table("db.table").load()`

I'm fine with this if you think it is confusing to parse the path as a 
table name in load. I think it is reasonable.

I'd still like to keep the `Option[TableIdentifier]` parameter on the 
relation, so that we can support `table` or `insertInto` on the write path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@felixcheung, yes, we do already have a `table` option. That creates an 
`UnresolvedRelation` with the parsed table name as a `TableIdentifier`, which 
is not currently compatible with `DataSourceV2` because there is no standard 
way to pass the identifier's db and table name.

Part of the intent here is to add support in `DataSourceV2Relation` for 
cases where we have a `TableIdentifier`, so that we can add a resolver rule 
that replaces `UnresolvedRelation` with `DataSourceV2Relation`. This is what we 
do in our Spark branch.

@cloud-fan, what is your objection to support like this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20387
  
don't we already have table in DataFrameReader? 
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sql.DataFrameReader.table

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@table(tableName:String):org.apache.spark.sql.DataFrame


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
> I thought it was a good thing to push a single node down at a time and 
not depend on order.

The order must be taken care. For example, we can't push down a limit 
through Filter, unless the entire filter is pushed into the data source. 
Generally, if we pushed down multiple operators into a data source, we should 
clearly define what the order is to apply these operators in the data source.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
> This is a new API...

Are you saying you wanna add a new method in `DataFreameReader` that is 
different than `load`? In Scala, parameter name is part of the method 
signature, so for `def load(path: String)`, we can't change its semantic, the 
parameter is a path. It's fine if a data source impelementation teach its users 
that path will interpreted as database/tables by it, but this should not be a 
contract in Spark.

I do agree that Spark should set a standard for specifying database and 
table, as it's very common. We can even argue that path is not a general 
concept for data sources, but we still provide special APIs for path.

My proposal: How about we add a new methods `table` in `DataFrameReader`? 
The usage would look like: 
`spark.read.format("iceberg").table("db.table").load()`, what do you think? We 
should not specify `database`, as if we may have catalog federation and table 
name may have 3 parts `catalog.db.table`. Let's keep it general and let the 
data source to interprete it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> It's hard to improve PhysicalOperation to support more operators and 
specific push down orders, so I created the new one

I'm concerned about the new one. The projection support seems really 
brittle because it calls out specific logical nodes and scans the entire plan. 
If we are doing push-down wrong on the current v1 and Hive code paths, then I'd 
like to see a proposal for fixing that without these drawbacks.

I like that this PR pushes projections and filters just like the other 
paths. We should start there and add additional push-down as necessary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> [The push-down rule may be run more than once if filters are not pushed 
through projections] looks weird, do you have a query to reproduce this issue?

One of the DataSourceV2 tests hit this. I thought it was a good thing to 
push a single node down at a time and not depend on order.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> I'd suggest that we just propogate the paths parameter to options, and 
data source implementations are free to interprete the path option to whatever 
they want, e.g. table and database names.

What about code paths that expect table names? In our branch, we've added 
support for converting Hive relations (which have a `TableIdentifier`, not a 
path) and using `insertInto`. Table names are paths are the two main ways to 
identify tables and I think both should be supported.

This is a new API, so it doesn't matter that `load` and `save` currently 
use paths. We can easily update that support for tables. If we don't, then 
there will be no common way to refer to tables: some implementations will use 
`table`, some will pass `db` separately, and some might use `database`. 
Standardizing this and adding support in Spark will produce more consistent 
behavior across data sources.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
> I'm ok to make it immutable if there is an significant benefit.

Mutable nodes violate a basic assumption of catalyst, that trees are 
immutable. Here's a good quote from the SIGMOD paper (by @rxin, @yhuai, and 
@marmbrus et al.):

> In our experience, functional transformations on immutable trees make the 
whole optimizer very easy to reason about and debug. They also enable 
parallelization in the optimizer, although we do not yet exploit this.

Mixing mutable nodes into supposedly immutable trees is a bad idea. Other 
nodes in the tree assume that children do not change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
I dig into the commit history and recalled why I made these decisions:

* having an mutable `DataSourceV2Relation`. This is mostly to avoid to keep 
adding more constructor parameters to `DataSourceV2Relation`, make the code 
easy to maintain. I'm ok to make it immutable if there is an significant 
benefit.
* not using `PhysicalOperation`. This is because we will add more push down 
optimizations(e.g. limit, aggregate, join), and we have a specify push down 
order for them. It's hard to improve `PhysicalOperation` to support more 
operators and specific push down orders, so I created the new one. Eventually 
all data sources will be implemented as data source v2, so `PhysicalOperation` 
will go away.


> The output of DataSourceV2Relation should be what is returned by the 
reader, in case the reader can only partially satisfy the requested schema 
projection

Good catch! Since `DataSourceV2Reader` is mutable, the output can't be 
fixed, as it may change when we apply data source optimizations. Using `lazy 
val output ...` can fix this.


> The requested projection passed to the DataSourceV2Reader should include 
filter columns

I did this intentionally. If a column is only refered by pushed filters, 
Spark doesn't need this column. Even if we require this column from the data 
source, we just read it out and wait it to be pruned by the next operator.


> The push-down rule may be run more than once if filters are not pushed 
through projections

This looks weird, do you have a query to reproduce this issue?


> This updates DataFrameReader to parse locations that do not look like 
paths as table names and pass the result as "database" and "table" keys in v2 
options.

Personally I'd suggest to use `spark.read.format("iceberg").option("table", 
"db.table").load()`, as `load` is defined as `def load(paths: String*)`, but I 
think your usage looks better. The communition protocol between Spark and data 
source is options, I'd suggest that we just propogate the `paths` parameter to 
options, and data source implementations are free to interprete the path option 
to whatever they want, e.g. table and database names.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
overall I think it's a good idea to make the plan immutable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86602/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86602 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)**
 for PR 20387 at commit 
[`83203a6`](https://github.com/apache/spark/commit/83203a6e117f180b1839c815e4c3b3ef539f6b2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86602 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86602/testReport)**
 for PR 20387 at commit 
[`83203a6`](https://github.com/apache/spark/commit/83203a6e117f180b1839c815e4c3b3ef539f6b2b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86601 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)**
 for PR 20387 at commit 
[`ac58844`](https://github.com/apache/spark/commit/ac58844118d543030fadfeda0a64b52ad659cf31).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86601/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86601/testReport)**
 for PR 20387 at commit 
[`ac58844`](https://github.com/apache/spark/commit/ac58844118d543030fadfeda0a64b52ad659cf31).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/200/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86600 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)**
 for PR 20387 at commit 
[`9c4dcb5`](https://github.com/apache/spark/commit/9c4dcb5b693e729e89ddd7daa54b19c8f8eb3571).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StreamingDataSourceV2Relation(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86600/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #86600 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86600/testReport)**
 for PR 20387 at commit 
[`9c4dcb5`](https://github.com/apache/spark/commit/9c4dcb5b693e729e89ddd7daa54b19c8f8eb3571).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/199/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@cloud-fan, please have a look at these changes. This will require 
follow-up for the Streaming side. I have yet to review the streaming interfaces 
for `DataSourceV2`, so I haven't made any changes there.

In our Spark build, I've also moved the write path to  use 
DataSourceV2Relation, which I intend to do in a follow-up to this issue.

@rxin FYI.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org