[ 
https://issues.apache.org/jira/browse/SPARK-53052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011540#comment-18011540
 ] 

Vindhya G edited comment on SPARK-53052 at 8/2/25 5:46 AM:
-----------------------------------------------------------

[Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
expected the {{{}Row(id='2', val='') }}(instead of \{{{}val='emptyStr'{}}}). 
I think this is behaving as expected as you are providing what emptyValue you 
define. Here you defined 'emptyStr' as your emptyValue.

 

As for the actual emptyValue with '' I do see the difference in behaviour in 
scala and python!


was (Author: JIRAUSER299405):
[Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
expected the {{{}Row(id='2', val='') }}(instead of \{{{}val='emptyStr'{}}}). 
I think this is behaving as expected as you are providing what emptyValue you 
define. Here you defined 'emptyStr' as your emptyValue.

 

As for the actual emptyValue with '' I am seeing same behaviour in both scala 
and python where it is printed as null!

> emptyValue option does not seem to work from pyspark
> ----------------------------------------------------
>
>                 Key: SPARK-53052
>                 URL: https://issues.apache.org/jira/browse/SPARK-53052
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.0
>         Environment: Spark 3.5.0
>            Reporter: Kash Bhatt
>            Priority: Minor
>
> According to 
> [docs|https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option]
>  of csv options:
> {quote}
> ||Property Name||Default||Meaning||
> |emptyValue|(for reading), "" (for writing)|Sets the string representation of 
> an empty value.|
> {quote}
> But it doesn't seem to work:
> {code:java}
> with open("/dbfs/tmp/c.csv", "w") as f:
>     f.write('''id,val
> 1,
> 2,emptyStr
> 3,str1
> ''')
> spark.read.csv('dbfs:/tmp/c.csv', header=True, 
> emptyValue='emptyStr').collect() {code}
> prints:
> [Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
> expected the {{{}Row(id='2', val='') (instead of val='emptyStr'{}}}).
> ----
> Not sure if it's related but although docs don't mention any relation to 
> {{nullValue}} option, it seems to affect {{{}emptyValue{}}}.
>  
> With this content in csv file:
> {code:java}
> id,val
> 1,
> 2,""
> 3,str1 {code}
> Following Scala code works as expected, i.e. prints 
> {{Array[org.apache.spark.sql.Row] = Array([1,null], [2,], [3,str1])}}
> {code:java}
> spark.read.option("header", "true")
> .option("emptyValue", "")
> .option("nullValue", null)
> .csv("dbfs:/tmp/c.csv").collect() {code}
> But PySpark code, doesn't work, prints: {{[Row(id='1', val=None), Row(id='2', 
> val=None), Row(id='3', val='str1')]}}
> {code:java}
> park.read.csv('dbfs:/tmp/c.csv', header=True, emptyValue='', 
> nullValue=None).collect() {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to