[jira] [Updated] (SPARK-37575) null values should be saved as nothing rather than quoted empty Strings "" with default settings

2022-01-06 Thread Wei Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Guo updated SPARK-37575:

Description: 
As mentioned in sql migration 
guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
{noformat}
Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 
2.3 and earlier, empty strings are equal to null values and do not reflect to 
any characters in saved CSV files. For example, the row of "a", null, "", 1 was 
written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore 
the previous behavior, set the CSV option emptyValue to empty (not quoted) 
string.{noformat}
But actually, both empty strings and null values are saved as quoted empty 
Strings "" rather than "" (for empty strings) and nothing(for null values)。

code:
{code:java}
val data = List("spark", null, "").toDF("name")
data.coalesce(1).write.csv("spark_csv_test")
{code}
 actual result:
{noformat}
line1: spark
line2: ""
line3: ""{noformat}
expected result:
{noformat}
line1: spark
line2: 
line3: ""
{noformat}

  was:
As mentioned in sql migration 
guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
{noformat}
Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 
2.3 and earlier, empty strings are equal to null values and do not reflect to 
any characters in saved CSV files. For example, the row of "a", null, "", 1 was 
written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore 
the previous behavior, set the CSV option emptyValue to empty (not quoted) 
string.{noformat}
 

But actually, both empty strings and null values are saved as quoted empty 
Strings "" rather than "" (for empty strings) and nothing(for null values)。

code:
{code:java}
val data = List("spark", null, "").toDF("name")
data.coalesce(1).write.csv("spark_csv_test")
{code}
 actual result:
{noformat}
line1: spark
line2: ""
line3: ""{noformat}
expected result:
{noformat}
line1: spark
line2: 
line3: ""
{noformat}


> null values should be saved as nothing rather than quoted empty Strings "" 
> with default settings
> 
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
> Fix For: 3.3.0
>
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37575) null values should be saved as nothing rather than quoted empty Strings "" with default settings

2021-12-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37575:
-
Fix Version/s: (was: 3.2.1)

> null values should be saved as nothing rather than quoted empty Strings "" 
> with default settings
> 
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
> Fix For: 3.3.0
>
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37575) null values should be saved as nothing rather than quoted empty Strings "" with default settings

2021-12-13 Thread Guo Wei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Wei updated SPARK-37575:

Summary: null values should be saved as nothing rather than quoted empty 
Strings "" with default settings  (was: null values should be saved as nothing 
rather than quoted empty Strings "" by default settings)

> null values should be saved as nothing rather than quoted empty Strings "" 
> with default settings
> 
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Guo Wei
>Priority: Major
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37575) null values should be saved as nothing rather than quoted empty Strings "" by default settings

2021-12-12 Thread Guo Wei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Wei updated SPARK-37575:

Summary: null values should be saved as nothing rather than quoted empty 
Strings "" by default settings  (was: Null values are saved as quoted empty 
Strings "" rather than nothing)

> null values should be saved as nothing rather than quoted empty Strings "" by 
> default settings
> --
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Guo Wei
>Priority: Major
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org