[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325686#comment-15325686 ] Apache Spark commented on SPARK-15585: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/13616 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323947#comment-15323947 ] Takeshi Yamamuro commented on SPARK-15585: -- okay, I'll push later. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323946#comment-15323946 ] Reynold Xin commented on SPARK-15585: - Great let's update the documentation that way. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323944#comment-15323944 ] Takeshi Yamamuro commented on SPARK-15585: -- yea, I manually checked that it works well. If we put an empty string, spark passes `u` into a csv parser. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323940#comment-15323940 ] Reynold Xin commented on SPARK-15585: - Looks good. Does empty string actually work? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323936#comment-15323936 ] Takeshi Yamamuro commented on SPARK-15585: -- Understood. Anyway, I think it's okay to update docs for this issue because it works in both python/scala. Is it okay to push this pr below? https://github.com/apache/spark/compare/master...maropu:SPARK-15585-2 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322992#comment-15322992 ] Reynold Xin commented on SPARK-15585: - They suffer from the same problem. Before your patch, df.read.option("quote", "x").csv(...) would use x as the quote. After your patch, it would use the default quote. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322670#comment-15322670 ] Takeshi Yamamuro commented on SPARK-15585: -- I'm afraid the `sep` option for `csv` overrides the `delimiter` option. On the other hand, the original description describes the `quote` option and it seems the `quote` one is not related to the `sep` one. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320133#comment-15320133 ] Reynold Xin commented on SPARK-15585: - It would woudln't it? Because the sep argument for the "csv" function would always override the options that's previously specified. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318132#comment-15318132 ] Takeshi Yamamuro commented on SPARK-15585: -- btw, the behavior of `df.option("sep", "|").csv("...")` changes before and after my pr #13372 applied? `CSVOptions#getChar` does not seem to affect the behavior as follows; https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L67 Manually checked in spark-shell, but I got the same result between them. Anything I missed there? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318113#comment-15318113 ] Takeshi Yamamuro commented on SPARK-15585: -- yea, it's okay to me just to add docs about different behaviours between `sqlContext.read.csv` and ``com.databricks.spark.csv` like; https://github.com/apache/spark/compare/master...maropu:SPARK-15585-2 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316288#comment-15316288 ] Reynold Xin commented on SPARK-15585: - [~maropu] I think the best way is to advise users to pass \u in. Can you check if that is possible in both Python/Scala? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305185#comment-15305185 ] Apache Spark commented on SPARK-15585: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/13372 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305178#comment-15305178 ] Takeshi Yamamuro commented on SPARK-15585: -- okay, I'll push soon. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305166#comment-15305166 ] Reynold Xin commented on SPARK-15585: - Feel free to create a pr with python changes and then we can iterate on the R part too. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305152#comment-15305152 ] Takeshi Yamamuro commented on SPARK-15585: -- okay > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304267#comment-15304267 ] Shivaram Venkataraman commented on SPARK-15585: --- [~maropu] Can you also add test cases in Python, R in the PR ? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303787#comment-15303787 ] Takeshi Yamamuro commented on SPARK-15585: -- okay, I got your point. I'll make a pr based on https://github.com/maropu/spark/compare/master...SPARK-15585. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303722#comment-15303722 ] Reynold Xin commented on SPARK-15585: - I was suggesting setting the value to None directly rather than not setting it. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303721#comment-15303721 ] Takeshi Yamamuro commented on SPARK-15585: -- If quote is "NONE" in readwriter.py, I think no value is passed into CSVOptions#getChar. As a result, not "case Some(null) => default" but "case None => default" is matched. This is manually checked. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303692#comment-15303692 ] Reynold Xin commented on SPARK-15585: - "None" becomes null, doesn't it? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303690#comment-15303690 ] Takeshi Yamamuro commented on SPARK-15585: -- We cannot pass `null` at `quote` for univocity parsers because the argument type is `char`. So, I think `CSVOptions#getChar` cannot return `null`. On the other hand, spark-csv uses commons CSV and it can set null at `quote` (See: https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvRelation.scala#L82). It seems we can get the same behaviour with spark-csv if we set 'u' at quote when `null` passed. https://github.com/maropu/spark/compare/master...SPARK-15585 Also, we need to fix readwriter.py according to this issue (any default quote is obviously set there)? AFAIK there is no way for pyspark to pass `null` into CsvOptions#getChar. https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L375 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303523#comment-15303523 ] Shivaram Venkataraman commented on SPARK-15585: --- I am not sure i completely understand the question - The way the options get passed in R [1] is that we create a hash map and fill it in with anything passed in by the user. `NULL` is a restricted keyword in R (note that its in all caps), and it gets deserialized / passed as `null` to Scala. [1] https://github.com/apache/spark/blob/c82883239eadc4615a3aba907cd4633cb7aed26e/R/pkg/R/SQLContext.R#L658 > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303160#comment-15303160 ] Takeshi Yamamuro commented on SPARK-15585: -- yea, If no problem, I'll take this. > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303142#comment-15303142 ] Reynold Xin commented on SPARK-15585: - cc [~maropu] interested in doing this? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15585) Don't use null in data source options to indicate default value
[ https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303144#comment-15303144 ] Reynold Xin commented on SPARK-15585: - cc [~shivaram] / [~sunrui] / [~felixcheung] would this impact R? > Don't use null in data source options to indicate default value > --- > > Key: SPARK-15585 > URL: https://issues.apache.org/jira/browse/SPARK-15585 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Priority: Critical > > See email: > http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html > We'd need to change DataFrameReader/DataFrameWriter in Python's > csv/json/parquet/... functions to put the actual default option values as > function parameters, rather than setting them to None. We can then in > CSVOptions.getChar (and JSONOptions, etc) to actually return null if the > value is null, rather than setting it to default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org