[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #68477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68477/consoleFull)** for PR 14124 at commit [`d240c0d`](https://github.com/apache/spark/commit/d240c0d9fbca446a5b302f739a896f96818d2907). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #68474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68474/consoleFull)** for PR 14124 at commit [`7306937`](https://github.com/apache/spark/commit/73069373c834291d507a25d7ae8da90a6dec95c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Actually, nvm. I think handling this in `DataFrameReader.schema` will deal with most of general cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Oh wait, @cloud-fan, it seems, at least, Parquet files could possibly be written with not nullable fields. So, reading it without user-specified schema might also cause the inconsistency between the schema read from structured streaming and the one read from filed sources. If you are not sure of this, I am fine with turning the schema into nullable in `DataFrameReader.schema` for now. Let me rebase this one first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Thanks @cloud-fan, sure, that sounds great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14124 Sorry for the delay. After thinking it again, I think it doesn't make sense to allow users to specify the nullability when reading a data source. How about we turn schema to nullable in `DataFrameReader.schema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67029/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #67029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)** for PR 14124 at commit [`3f153a3`](https://github.com/apache/spark/commit/3f153a3b0969c08708675872ef9bb472f804ad57). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #67029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67029/consoleFull)** for PR 14124 at commit [`3f153a3`](https://github.com/apache/spark/commit/3f153a3b0969c08708675872ef9bb472f804ad57). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65747/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #65747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)** for PR 14124 at commit [`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #65747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)** for PR 14124 at commit [`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65360/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #65360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)** for PR 14124 at commit [`f6be52b`](https://github.com/apache/spark/commit/f6be52b7450ad2797e49fc1116844a0f4dd809e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #65360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65360/consoleFull)** for PR 14124 at commit [`f6be52b`](https://github.com/apache/spark/commit/f6be52b7450ad2797e49fc1116844a0f4dd809e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64631/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #64631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)** for PR 14124 at commit [`ffacb55`](https://github.com/apache/spark/commit/ffacb55a9a13fc3144683d9dad8f2da21705a613). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #64631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64631/consoleFull)** for PR 14124 at commit [`ffacb55`](https://github.com/apache/spark/commit/ffacb55a9a13fc3144683d9dad8f2da21705a613). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64066/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #64066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)** for PR 14124 at commit [`079aae2`](https://github.com/apache/spark/commit/079aae2e1f5a94ed4cb06ab797fa27951205a328). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #64066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64066/consoleFull)** for PR 14124 at commit [`079aae2`](https://github.com/apache/spark/commit/079aae2e1f5a94ed4cb06ab797fa27951205a328). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 @cloud-fan If nullability should be not ignored, then I can fix this PR to make them consistent to not ignoring it (and of course I will try to identify the related problems). In this case, I will work on what @gatorsmile pointed out in https://github.com/apache/spark/pull/14124#issuecomment-231594416 about JSON (and will check the other data sources as well). I will follow your decision. To cut the all comments above short, (for other reviewers), - The purpose of this PR is whether it should force all schema to nullable schema or not. - This is already happening with normal reading and writing for data sources based on `FileFormat`. - This is for both inferred/read schema and user-given schema. - For `json(rdd: RDD[String])` API, this is not hapenning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 BTW, actually, this is not only about user-given schema. Currently, it always reads data into dataframe by datasources based on `FileFormat` ignoring nullability in schema (for both user-given schema and inferred/read schema). However, this does not happen when reading for streaming by the datasources (and another JSON api). So, this PR tries to make them consistent to ignore the nullability in schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Thanks for feedback @cloud-fan ! If the user-given schema is wrong, it is handled differently for each datasource specific. - For JSON and CSV it is kind of permissive generally (for example, compatibility among numeric types). - For ORC and Parquet Generally it is strict to types. So they don't allow the compatibility (except for very few cases, e.g. for parquet, https://github.com/apache/spark/pull/14272 and https://github.com/apache/spark/pull/14278) I think so. Should we disallow specifying schemas for these? - For JDBC it does not take user-given schema since it does not implement `SchemaRelationProvider`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14124 What will happen if the given schema is wrong? It seems weird that we allow users to provide schema while reading the data, but without validating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14124 @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 gentle ping @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Could you take a look please @marmbrus ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 @viirya Thanks for your comment! Actually, that's I want to have some feedback for from @marmbrus . It seems forcing to a nullable schema all is already happening when you read/write data via `read`/`write` API (but not for structured streaming and another API for json). So, actually, the reason of this PR is, to make all consistent. The reason to make them consistent in a way that the schema is forced as nullable is what he said in the mailing list. >Sure, but a traditional RDBMS has the opportunity to do validation before >loading data in. Thats not really an option when you are reading random >files from S3. This is why Hive and many other systems in this space treat >all columns as nullable. Actually, Parquet also reads and writes the schema with nullability correctly if we get rid of `asNullable` (I tested this before) but it seems that's prevented due to (I assume) the reason above. @marmbrus Do you mind if I ask to clarify here please? I think we may have to deal with this as datasource-specific problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14124 @HyukjinKwon Your patch solves this inconsistency by forcing schema as nullable at all. However, looks like the parquet case is for compatibility, is this the same for json? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 I am a bit confused if we are allowed to read JSON (via `json(jsonRDD: RDD[String])` API) with schema having fields set `false` in `nullable`. If it is meant to be not allowed, this issue will prevents the case above. But, yea, I think I agree that it is a potential problem anyway (even if the case above is not allowed.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14124 @HyukjinKwon No matter whether this PR is merged or not, I still think we should fix the above issue. Silent conversion does not look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Oh, I see, before this patch ``` +---+ | a| +---+ | 1| | 0| +---+ ``` after this patch ``` ++ | a| ++ | 1| |null| ++ ``` FYI, currently (before this patch) the code below ```scala val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}")) val schema = StructType(StructField("a", StringType, nullable = false) :: Nil) val df = spark.read.schema(schema).json(rdd) df.show() ``` is being failed with the exception below: ``` Error while decoding: java.lang.NullPointerException createexternalrow(input[0, string, false].toString, StructField(a,StringType,false)) +- input[0, string, false].toString +- input[0, string, false] java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException createexternalrow(input[0, string, false].toString, StructField(a,StringType,false)) +- input[0, string, false].toString +- input[0, string, false] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:292) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$15.apply(Dataset.scala:2218) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) ... ``` It seems unexpected behaviour anyway. I will submit a patch if this one is decided not worth being added. Thanks @gatorsmile again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Ah, yes it seems a bug to me.. I thought it throws an exception in that case. Does this PR introduce the problem? (Just curious and to be sure). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14124 ``` val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}")) val schema = StructType(StructField("a", IntegerType, nullable = false) :: Nil) val df = spark.read.schema(schema).json(rdd) df.printSchema() ``` When user-specified schemas are not nullable and the data contains null, the null value in the result becomes `0`. This looks like a bug, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)** for PR 14124 at commit [`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)** for PR 14124 at commit [`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62053/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)** for PR 14124 at commit [`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)** for PR 14124 at commit [`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62049/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)** for PR 14124 at commit [`a917678`](https://github.com/apache/spark/commit/a917678886779f236b1feffa23a11529ce67e97c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #62049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)** for PR 14124 at commit [`a917678`](https://github.com/apache/spark/commit/a917678886779f236b1feffa23a11529ce67e97c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 Hi @gatorsmile and @marmbrus, I saw the discussion and found you are related with this one. Could you please review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org