[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16812 I still don't think need this since the workaround is easy. If other committers find it worth, I won't object. If there are no interests fro this PR afterwards, I would just close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user dhunziker commented on the issue: https://github.com/apache/spark/pull/16812 > This can be easily worked around, no? It can be worked around, losing the boolean type by converting to string in the process though. With any workaround it'll be better to get eventually fixed/enhanced, especially as it was already raised against the original CSV datasource years ago. I did comment on this before and still think it's a valid improvement given this already exists for null, nan, inf and date values and the other reasons mentioned above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16812 This can be easily worked around, no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16812 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16812 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16812 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user dhunziker commented on the issue: https://github.com/apache/spark/pull/16812 Oracle doesn't have boolean so it's usually modelled as char(1) with Y/N, Sybase doesn't have boolean either but bit which is 1/0. PosgreSQL supports a range of values (https://www.postgresql.org/docs/10/static/datatype-boolean.html). In regards to CSV parsers I found the same feature in SuperCSV and uniVocity (linked above). I think it's a very useful and common feature similar to custom date formats or null strings, but as mentioned before, not something that can't be worked around. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16812 Could you give some systems that have such a feature? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user dhunziker commented on the issue: https://github.com/apache/spark/pull/16812 That would remain a workaround though. The uniVocity parser for boolean supports this as well: https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/conversions/BooleanConversion.java. It's also been raised as a ticket against the original spark-csv so there seems to be at least some interest. Personally, I think this can help a lot when integrating Spark with other systems (i.e. some DB exports where modeling booleans as flags is common) similar to having ways of specifying the format of a date or null value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16812 Hm, do we really need this? I think we could simply work around this by `case` & `when`, `if` or other expressions with a projection after loading. Or using `csv(ds: Dataset[String])` API after preprocessing in the source dataset. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16812 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org