[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #68477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68477/consoleFull)**
 for PR 14124 at commit 
[`d240c0d`](https://github.com/apache/spark/commit/d240c0d9fbca446a5b302f739a896f96818d2907).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #68474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68474/consoleFull)**
 for PR 14124 at commit 
[`7306937`](https://github.com/apache/spark/commit/73069373c834291d507a25d7ae8da90a6dec95c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Actually, nvm. I think handling this in `DataFrameReader.schema` will deal 
with most of general cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Oh wait, @cloud-fan, it seems, at least, Parquet files could possibly be 
written with not nullable fields. So, reading it without user-specified schema 
might also cause the inconsistency between the schema read from structured 
streaming and the one read from filed sources.

If you are not sure of this, I am fine with turning the schema into 
nullable in `DataFrameReader.schema` for now. Let me rebase this one first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Thanks @cloud-fan, sure, that sounds great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14124
  
Sorry for the delay. After thinking it again, I think it doesn't make sense 
to allow users to specify the nullability when reading a data source. How about 
we turn schema to nullable in `DataFrameReader.schema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67029/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #67029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)**
 for PR 14124 at commit 
[`3f153a3`](https://github.com/apache/spark/commit/3f153a3b0969c08708675872ef9bb472f804ad57).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-10-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #67029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)**
 for PR 14124 at commit 
[`3f153a3`](https://github.com/apache/spark/commit/3f153a3b0969c08708675872ef9bb472f804ad57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65747/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #65747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)**
 for PR 14124 at commit 
[`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #65747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)**
 for PR 14124 at commit 
[`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65360/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #65360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)**
 for PR 14124 at commit 
[`f6be52b`](https://github.com/apache/spark/commit/f6be52b7450ad2797e49fc1116844a0f4dd809e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #65360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)**
 for PR 14124 at commit 
[`f6be52b`](https://github.com/apache/spark/commit/f6be52b7450ad2797e49fc1116844a0f4dd809e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64631/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #64631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)**
 for PR 14124 at commit 
[`ffacb55`](https://github.com/apache/spark/commit/ffacb55a9a13fc3144683d9dad8f2da21705a613).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #64631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)**
 for PR 14124 at commit 
[`ffacb55`](https://github.com/apache/spark/commit/ffacb55a9a13fc3144683d9dad8f2da21705a613).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64066/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #64066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)**
 for PR 14124 at commit 
[`079aae2`](https://github.com/apache/spark/commit/079aae2e1f5a94ed4cb06ab797fa27951205a328).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #64066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)**
 for PR 14124 at commit 
[`079aae2`](https://github.com/apache/spark/commit/079aae2e1f5a94ed4cb06ab797fa27951205a328).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
@cloud-fan If nullability should be not ignored, then I can fix this PR to 
make them consistent to not ignoring it (and of course I will try to identify 
the related problems). In this case, I will work on what @gatorsmile pointed 
out in https://github.com/apache/spark/pull/14124#issuecomment-231594416 about 
JSON (and will check the other data sources as well).

I will follow your decision.


To cut the all comments above short, (for other reviewers), 
 - The purpose of this PR is whether it should force all schema to nullable 
schema or not.
 - This is already happening with normal reading and writing for data 
sources based on `FileFormat`.
 - This is for both inferred/read schema and user-given schema. 
 - For `json(rdd: RDD[String])` API, this is not hapenning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
BTW, actually, this is not only about user-given schema.

Currently, it always reads data into dataframe by datasources based on 
`FileFormat` ignoring nullability in schema (for both user-given schema and 
inferred/read schema).

However, this does not happen when reading for streaming by the datasources 
(and another JSON api).

So, this PR tries to make them consistent to ignore the nullability in 
schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Thanks for feedback @cloud-fan !

If the user-given schema is wrong, it is handled differently for each 
datasource specific.

- For JSON and CSV
  it is kind of permissive generally (for example, compatibility among 
numeric types).

- For ORC and Parquet
  Generally it is strict to types. So they don't allow the compatibility 
(except for very few cases, e.g. for parquet, 
https://github.com/apache/spark/pull/14272 and 
https://github.com/apache/spark/pull/14278)

I think so. Should we disallow specifying schemas for these? 
- For JDBC
  it does not take user-given schema since it does not implement 
`SchemaRelationProvider`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14124
  
What will happen if the given schema is wrong? It seems weird that we allow 
users to provide schema while reading the data, but without validating it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-26 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/14124
  
@cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
gentle ping @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Could you take a look please @marmbrus ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
@viirya Thanks for your comment! Actually, that's I want to have some 
feedback for from @marmbrus .

It seems forcing to a nullable schema all is already happening when you 
read/write data via `read`/`write` API (but not for structured streaming and 
another API for json).

So, actually, the reason of this PR is, to make all consistent. The reason 
to make them consistent in a way that the schema is forced as nullable is what 
he said in the mailing list.

>Sure, but a traditional RDBMS has the opportunity to do validation before
>loading data in.  Thats not really an option when you are reading random
>files from S3.  This is why Hive and many other systems in this space treat
>all columns as nullable.

Actually, Parquet also reads and writes the schema with nullability 
correctly if we get rid of `asNullable` (I tested this before) but it seems 
that's prevented due to (I assume) the reason above.

@marmbrus Do you mind if I ask to clarify here please?

I think we may have to deal with this as datasource-specific problem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14124
  
@HyukjinKwon Your patch solves this inconsistency by forcing schema as 
nullable at all. However,  looks like the parquet case is for compatibility, is 
this the same for json?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
I am a bit confused if we are allowed to read JSON (via `json(jsonRDD: 
RDD[String])` API) with schema having fields set `false` in `nullable`.
If it is meant to be not allowed, this issue will prevents the case above. 

But, yea, I think I agree that it is a potential problem anyway (even if 
the case above is not allowed.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14124
  
@HyukjinKwon No matter whether this PR is merged or not, I still think we 
should fix the above issue. Silent conversion does not look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Oh, I see, before this patch

```
+---+
|  a|
+---+
|  1|
|  0|
+---+
```


after this patch

```
++
|   a|
++
|   1|
|null|
++
```

FYI, currently (before this patch) the code below

```scala
val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}"))
val schema = StructType(StructField("a", StringType, nullable = false) :: 
Nil)
val df = spark.read.schema(schema).json(rdd)
df.show()
```

is being failed with the exception below:

```
Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, 
StructField(a,StringType,false))
+- input[0, string, false].toString
   +- input[0, string, false]

java.lang.RuntimeException: Error while decoding: 
java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, 
StructField(a,StringType,false))
+- input[0, string, false].toString
   +- input[0, string, false]

at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:292)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
...
```

It seems unexpected behaviour anyway. I will submit a patch if this one is 
decided not worth being added. Thanks @gatorsmile again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Ah, yes it seems a bug to me.. I thought it throws an exception in that 
case. Does this PR introduce the problem? (Just curious and to be sure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14124
  
```
val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}"))
val schema = StructType(StructField("a", IntegerType, nullable = false) :: 
Nil)
val df = spark.read.schema(schema).json(rdd)
df.printSchema()
```

When user-specified schemas are not nullable and the data contains null, 
the null value in the result becomes `0`. This looks like a bug, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62054/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
 for PR 14124 at commit 
[`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
 for PR 14124 at commit 
[`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62053/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
 for PR 14124 at commit 
[`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
 for PR 14124 at commit 
[`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62049/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)**
 for PR 14124 at commit 
[`a917678`](https://github.com/apache/spark/commit/a917678886779f236b1feffa23a11529ce67e97c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)**
 for PR 14124 at commit 
[`a917678`](https://github.com/apache/spark/commit/a917678886779f236b1feffa23a11529ce67e97c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
Hi @gatorsmile and @marmbrus, I saw the discussion and found you are 
related with this one. Could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org