subject:"\[jira\] \[Comment Edited\] \(SPARK\-37604\) Change emptyValueInRead's effect to that any fields matching this string will be set as \"\" when reading csv files"

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

2021-12-15 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460144#comment-17460144
 ] 

Wei Guo edited comment on SPARK-37604 at 12/15/21, 6:05 PM:


For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!empty_test.png|width=701,height=286!

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 


was (Author: wayne guo):
For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!empty_test.png!

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 

> Change emptyValueInRead's effect to that any fields matching this string will 
> be set as "" when reading csv files
> -
>
> Key: SPARK-37604
> URL: https://issues.apache.org/jira/browse/SPARK-37604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Priority: Major
> Attachments: empty_test.png
>
>
> The csv data format is imported from databricks 
> [spark-csv|https://github.com/databricks/spark-csv] by issue SPARK-12833 with 
> PR [10766|https://github.com/apache/spark/pull/10766] .
> {*}For the nullValue option{*}, according to features described in spark-csv 
> readme file, it's designed as:
> {noformat}
> When reading files:
> nullValue: specifies a string that indicates a null value, any fields 
> matching this string will be set as nulls in the DataFrame
> When writing files:
> nullValue: specifies a string that indicates a null value, nulls in the 
> DataFrame will be written as this string.
> {noformat}
> For example, when writing:
> {code:scala}
> Seq(("Tesla", null:String)).toDF("make", "comment").write.option("nullValue", 
> "NULL").csv(path){code}
> The saved csv file is shown as:
> {noformat}
> Tesla,NULL
> {noformat}
> When reading:
> {code:scala}
> spark.read.option("nullValue", "NULL").csv(path).show()
> {code}
> The parsed dataframe is shown as:
> ||make||comment||
> |Tesla|null|
> We can find that null columns in dataframe can be saved as "NULL" strings in 
> csv files and {color:#00875a}*"NULL" strings in csv files can be parsed as 
> null columns*{color} in dataframe. That is:
> {noformat}
> When writing, convert null(in dataframe) to nullValue(in csv)
> When reading, convert nullValue or nothing(in csv) to null(in dataframe)
> {noformat}
> But actually, the option nullValue in depended component univocity's 
> {*}_CommonSettings_{*}, is designed as that:
> {noformat}
> when reading, if the parser does not read any character from the input, the 
> nullValue is used instead of an empty string.
> when writing, if the writer has a null object to write to the output, the 
> nullValue is used instead of an empty string.{noformat}
> {*}There is a difference when reading{*}. In univocity, nothing content will 
> be convert to nullValue strings. But In Spark, we finally convert nothing 
> content or nullValue strings to null in *_UnivocityParser_ _nullSafeDatum_* 
> method:
> {code:java}
> private def nullSafeDatum(
>  datum: String,
>  name: String,
>  nullable: Boolean,
>  options: CSVOptions)(converter: ValueConverter): Any = {
>   if (datum == options.nullValue || datum == null) {
> if (!nullable) {
>   throw QueryExecutionErrors.foundNullValueForNotNullableFieldError(name)
> }
> null
>   } else {
> converter.apply(datum)
>   }
> } {code}
>  
> From now, we start to talk about emptyValue.
> {*}For the emptyValue option{*},  we add a emptyValueInRead option for 
> reading and a emptyValueInWrite option for writing. I found that Spark keeps 
> the same behaviors for emptyValue with

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

2021-12-15 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460144#comment-17460144
 ] 

Wei Guo edited comment on SPARK-37604 at 12/15/21, 6:05 PM:


For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!empty_test.png!

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 


was (Author: wayne guo):
For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 

> Change emptyValueInRead's effect to that any fields matching this string will 
> be set as "" when reading csv files
> -
>
> Key: SPARK-37604
> URL: https://issues.apache.org/jira/browse/SPARK-37604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Priority: Major
> Attachments: empty_test.png
>
>
> The csv data format is imported from databricks 
> [spark-csv|https://github.com/databricks/spark-csv] by issue SPARK-12833 with 
> PR [10766|https://github.com/apache/spark/pull/10766] .
> {*}For the nullValue option{*}, according to features described in spark-csv 
> readme file, it's designed as:
> {noformat}
> When reading files:
> nullValue: specifies a string that indicates a null value, any fields 
> matching this string will be set as nulls in the DataFrame
> When writing files:
> nullValue: specifies a string that indicates a null value, nulls in the 
> DataFrame will be written as this string.
> {noformat}
> For example, when writing:
> {code:scala}
> Seq(("Tesla", null:String)).toDF("make", "comment").write.option("nullValue", 
> "NULL").csv(path){code}
> The saved csv file is shown as:
> {noformat}
> Tesla,NULL
> {noformat}
> When reading:
> {code:scala}
> spark.read.option("nullValue", "NULL").csv(path).show()
> {code}
> The parsed dataframe is shown as:
> ||make||comment||
> |Tesla|null|
> We can find that null columns in dataframe can be saved as "NULL" strings in 
> csv files and {color:#00875a}*"NULL" strings in csv files can be parsed as 
> null columns*{color} in dataframe. That is:
> {noformat}
> When writing, convert null(in dataframe) to nullValue(in csv)
> When reading, convert nullValue or nothing(in csv) to null(in dataframe)
> {noformat}
> But actually, the option nullValue in depended component univocity's 
> {*}_CommonSettings_{*}, is designed as that:
> {noformat}
> when reading, if the parser does not read any character from the input, the 
> nullValue is used instead of an empty string.
> when writing, if the writer has a null object to write to the output, the 
> nullValue is used instead of an empty string.{noformat}
> {*}There is a difference when reading{*}. In univocity, nothing content will 
> be convert to nullValue strings. But In Spark, we finally convert nothing 
> content or nullValue strings to null in *_UnivocityParser_ _nullSafeDatum_* 
> method:
> {code:java}
> private def nullSafeDatum(
>  datum: String,
>  name: String,
>  nullable: Boolean,
>  options: CSVOptions)(converter: ValueConverter): Any = {
>   if (datum == options.nullValue || datum == null) {
> if (!nullable) {
>   throw QueryExecutionErrors.foundNullValueForNotNullableFieldError(name)
> }
> null
>   } else {
> converter.apply(datum)
>   }
> } {code}
>  
> From now, we start to talk about emptyValue.
> {*}For the emptyValue option{*},  we add a emptyValueInRead option for 
> reading and a emptyValueInWrite option for writing. I found that Spark keeps 
> the same behaviors for

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

2021-12-15 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460144#comment-17460144
 ] 

Wei Guo edited comment on SPARK-37604 at 12/15/21, 6:04 PM:


For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 


was (Author: wayne guo):
For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

 

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 

> Change emptyValueInRead's effect to that any fields matching this string will 
> be set as "" when reading csv files
> -
>
> Key: SPARK-37604
> URL: https://issues.apache.org/jira/browse/SPARK-37604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Priority: Major
> Attachments: empty_test.png, image-2021-12-16-01-57-55-864.png
>
>
> The csv data format is imported from databricks 
> [spark-csv|https://github.com/databricks/spark-csv] by issue SPARK-12833 with 
> PR [10766|https://github.com/apache/spark/pull/10766] .
> {*}For the nullValue option{*}, according to features described in spark-csv 
> readme file, it's designed as:
> {noformat}
> When reading files:
> nullValue: specifies a string that indicates a null value, any fields 
> matching this string will be set as nulls in the DataFrame
> When writing files:
> nullValue: specifies a string that indicates a null value, nulls in the 
> DataFrame will be written as this string.
> {noformat}
> For example, when writing:
> {code:scala}
> Seq(("Tesla", null:String)).toDF("make", "comment").write.option("nullValue", 
> "NULL").csv(path){code}
> The saved csv file is shown as:
> {noformat}
> Tesla,NULL
> {noformat}
> When reading:
> {code:scala}
> spark.read.option("nullValue", "NULL").csv(path).show()
> {code}
> The parsed dataframe is shown as:
> ||make||comment||
> |Tesla|null|
> We can find that null columns in dataframe can be saved as "NULL" strings in 
> csv files and {color:#00875a}*"NULL" strings in csv files can be parsed as 
> null columns*{color} in dataframe. That is:
> {noformat}
> When writing, convert null(in dataframe) to nullValue(in csv)
> When reading, convert nullValue or nothing(in csv) to null(in dataframe)
> {noformat}
> But actually, the option nullValue in depended component univocity's 
> {*}_CommonSettings_{*}, is designed as that:
> {noformat}
> when reading, if the parser does not read any character from the input, the 
> nullValue is used instead of an empty string.
> when writing, if the writer has a null object to write to the output, the 
> nullValue is used instead of an empty string.{noformat}
> {*}There is a difference when reading{*}. In univocity, nothing content will 
> be convert to nullValue strings. But In Spark, we finally convert nothing 
> content or nullValue strings to null in *_UnivocityParser_ _nullSafeDatum_* 
> method:
> {code:java}
> private def nullSafeDatum(
>  datum: String,
>  name: String,
>  nullable: Boolean,
>  options: CSVOptions)(converter: ValueConverter): Any = {
>   if (datum == options.nullValue || datum == null) {
> if (!nullable) {
>   throw QueryExecutionErrors.foundNullValueForNotNullableFieldError(name)
> }
> null
>   } else {
> converter.apply(datum)
>   }
> } {code}
>  
> From now, we start to talk about emptyValue.
> {*}For the emptyValue option{*},  we add a emptyValueInRead option for 
> reading and a

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

2021-12-15 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460144#comment-17460144
 ] 

Wei Guo edited comment on SPARK-37604 at 12/15/21, 6:03 PM:


For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

 

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible. 

FYI. [~maxgekk] 


was (Author: wayne guo):
For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

 

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible

> Change emptyValueInRead's effect to that any fields matching this string will 
> be set as "" when reading csv files
> -
>
> Key: SPARK-37604
> URL: https://issues.apache.org/jira/browse/SPARK-37604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Priority: Major
> Attachments: empty_test.png, image-2021-12-16-01-57-55-864.png
>
>
> The csv data format is imported from databricks 
> [spark-csv|https://github.com/databricks/spark-csv] by issue SPARK-12833 with 
> PR [10766|https://github.com/apache/spark/pull/10766] .
> {*}For the nullValue option{*}, according to features described in spark-csv 
> readme file, it's designed as:
> {noformat}
> When reading files:
> nullValue: specifies a string that indicates a null value, any fields 
> matching this string will be set as nulls in the DataFrame
> When writing files:
> nullValue: specifies a string that indicates a null value, nulls in the 
> DataFrame will be written as this string.
> {noformat}
> For example, when writing:
> {code:scala}
> Seq(("Tesla", null:String)).toDF("make", "comment").write.option("nullValue", 
> "NULL").csv(path){code}
> The saved csv file is shown as:
> {noformat}
> Tesla,NULL
> {noformat}
> When reading:
> {code:scala}
> spark.read.option("nullValue", "NULL").csv(path).show()
> {code}
> The parsed dataframe is shown as:
> ||make||comment||
> |Tesla|null|
> We can find that null columns in dataframe can be saved as "NULL" strings in 
> csv files and {color:#00875a}*"NULL" strings in csv files can be parsed as 
> null columns*{color} in dataframe. That is:
> {noformat}
> When writing, convert null(in dataframe) to nullValue(in csv)
> When reading, convert nullValue or nothing(in csv) to null(in dataframe)
> {noformat}
> But actually, the option nullValue in depended component univocity's 
> {*}_CommonSettings_{*}, is designed as that:
> {noformat}
> when reading, if the parser does not read any character from the input, the 
> nullValue is used instead of an empty string.
> when writing, if the writer has a null object to write to the output, the 
> nullValue is used instead of an empty string.{noformat}
> {*}There is a difference when reading{*}. In univocity, nothing content will 
> be convert to nullValue strings. But In Spark, we finally convert nothing 
> content or nullValue strings to null in *_UnivocityParser_ _nullSafeDatum_* 
> method:
> {code:java}
> private def nullSafeDatum(
>  datum: String,
>  name: String,
>  nullable: Boolean,
>  options: CSVOptions)(converter: ValueConverter): Any = {
>   if (datum == options.nullValue || datum == null) {
> if (!nullable) {
>   throw QueryExecutionErrors.foundNullValueForNotNullableFieldError(name)
> }
> null
>   } else {
> converter.apply(datum)
>   }
> } {code}
>  
> From now, we start to talk about emptyValue.
> {*}For the emptyValue option{*},  we add a emptyValueInRead option for 
> reading and a emptyValueInWrite option for

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

2021-12-15 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460144#comment-17460144
 ] 

Wei Guo edited comment on SPARK-37604 at 12/15/21, 6:03 PM:


For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

 

For null values, we can write and read back with the same nullValue option, but 
for empty strings, even with same emptyValue option, it's irreversible


was (Author: wayne guo):
For codes:
{code:scala}
val data = Seq(("Tesla", "")).toDF("make", "comment")
data.write.option("emptyValue", "EMPTY").csv("/Users/guowei19/work/test_empty")
{code}
The csv file's content is as:
{noformat}
Tesla,EMPTY
{noformat}
(cat part-0-f0ed9c50-b1bf-4db9-9964-38fbf411e29c-c000.csv)

When I read it back to dataframe:
{code:scala}
spark.read.option("emptyValue", 
"EMPTY").schema(data.schema).csv("/Users/guowei19/work/test_empty").show()
{code}
I want the column *comment* is "" rather a "EMPTY" string.

!image-2021-12-16-01-57-55-864.png|width=424,height=173!

> Change emptyValueInRead's effect to that any fields matching this string will 
> be set as "" when reading csv files
> -
>
> Key: SPARK-37604
> URL: https://issues.apache.org/jira/browse/SPARK-37604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.2.0
>Reporter: Wei Guo
>Priority: Major
> Attachments: empty_test.png, image-2021-12-16-01-57-55-864.png
>
>
> The csv data format is imported from databricks 
> [spark-csv|https://github.com/databricks/spark-csv] by issue SPARK-12833 with 
> PR [10766|https://github.com/apache/spark/pull/10766] .
> {*}For the nullValue option{*}, according to features described in spark-csv 
> readme file, it's designed as:
> {noformat}
> When reading files:
> nullValue: specifies a string that indicates a null value, any fields 
> matching this string will be set as nulls in the DataFrame
> When writing files:
> nullValue: specifies a string that indicates a null value, nulls in the 
> DataFrame will be written as this string.
> {noformat}
> For example, when writing:
> {code:scala}
> Seq(("Tesla", null:String)).toDF("make", "comment").write.option("nullValue", 
> "NULL").csv(path){code}
> The saved csv file is shown as:
> {noformat}
> Tesla,NULL
> {noformat}
> When reading:
> {code:scala}
> spark.read.option("nullValue", "NULL").csv(path).show()
> {code}
> The parsed dataframe is shown as:
> ||make||comment||
> |Tesla|null|
> We can find that null columns in dataframe can be saved as "NULL" strings in 
> csv files and {color:#00875a}*"NULL" strings in csv files can be parsed as 
> null columns*{color} in dataframe. That is:
> {noformat}
> When writing, convert null(in dataframe) to nullValue(in csv)
> When reading, convert nullValue or nothing(in csv) to null(in dataframe)
> {noformat}
> But actually, the option nullValue in depended component univocity's 
> {*}_CommonSettings_{*}, is designed as that:
> {noformat}
> when reading, if the parser does not read any character from the input, the 
> nullValue is used instead of an empty string.
> when writing, if the writer has a null object to write to the output, the 
> nullValue is used instead of an empty string.{noformat}
> {*}There is a difference when reading{*}. In univocity, nothing content will 
> be convert to nullValue strings. But In Spark, we finally convert nothing 
> content or nullValue strings to null in *_UnivocityParser_ _nullSafeDatum_* 
> method:
> {code:java}
> private def nullSafeDatum(
>  datum: String,
>  name: String,
>  nullable: Boolean,
>  options: CSVOptions)(converter: ValueConverter): Any = {
>   if (datum == options.nullValue || datum == null) {
> if (!nullable) {
>   throw QueryExecutionErrors.foundNullValueForNotNullableFieldError(name)
> }
> null
>   } else {
> converter.apply(datum)
>   }
> } {code}
>  
> From now, we start to talk about emptyValue.
> {*}For the emptyValue option{*},  we add a emptyValueInRead option for 
> reading and a emptyValueInWrite option for writing. I found that Spark keeps 
> the same behaviors for emptyValue with univocity, that is:
> {noformat}
> When reading, if the parser does not read any character from the

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

[jira] [Comment Edited] (SPARK-37604) Change emptyValueInRead's effect to that any fields matching this string will be set as "" when reading csv files

5 matches

Site Navigation

Mail list logo

Footer information