[ 
https://issues.apache.org/jira/browse/SPARK-53052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011657#comment-18011657
 ] 

Kash Bhatt edited comment on SPARK-53052 at 8/3/25 4:09 AM:
------------------------------------------------------------

[~vindhyag] , you're right. I've learned a few new things about these two 
options since I posted this bug. This bug is invalid, but there are issues with 
python vs Scala api behavior and definatly with docs.

 

PS: implementation stops one from set any option to `None` in python

 

I'll update the ticket when I get a chance.

 

See:
 * 
[https://stackoverflow.com/questions/79721756/how-to-use-emptyvalue-option-in-pyspark-while-reading-a-csv-file]
 * 
[https://stackoverflow.com/questions/79721713/how-to-read-empty-string-as-well-as-null-values-from-a-csv-file-in-pyspark]

 


was (Author: JIRAUSER310621):
[~vindhyag] , you're right. I've learned a few new things about these two 
options since I posted this bug. This bug is invalid, but there are issues with 
python vs Scala api behavior and definatly with docs.

 

I'll update the ticket when I get a chance.

 

See:
 * 
[https://stackoverflow.com/questions/79721756/how-to-use-emptyvalue-option-in-pyspark-while-reading-a-csv-file]
 * 
[https://stackoverflow.com/questions/79721713/how-to-read-empty-string-as-well-as-null-values-from-a-csv-file-in-pyspark]

 

> emptyValue option does not seem to work from pyspark
> ----------------------------------------------------
>
>                 Key: SPARK-53052
>                 URL: https://issues.apache.org/jira/browse/SPARK-53052
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.0
>         Environment: Spark 3.5.0
>            Reporter: Kash Bhatt
>            Priority: Minor
>
> According to 
> [docs|https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option]
>  of csv options:
> {quote}
> ||Property Name||Default||Meaning||
> |emptyValue|(for reading), "" (for writing)|Sets the string representation of 
> an empty value.|
> {quote}
> But it doesn't seem to work:
> {code:java}
> with open("/dbfs/tmp/c.csv", "w") as f:
>     f.write('''id,val
> 1,
> 2,emptyStr
> 3,str1
> ''')
> spark.read.csv('dbfs:/tmp/c.csv', header=True, 
> emptyValue='emptyStr').collect() {code}
> prints:
> [Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
> expected the {{{}Row(id='2', val='') (instead of val='emptyStr'{}}}).
> ----
> Not sure if it's related but although docs don't mention any relation to 
> {{nullValue}} option, it seems to affect {{{}emptyValue{}}}.
>  
> With this content in csv file:
> {code:java}
> id,val
> 1,
> 2,""
> 3,str1 {code}
> Following Scala code works as expected, i.e. prints 
> {{Array[org.apache.spark.sql.Row] = Array([1,null], [2,], [3,str1])}}
> {code:java}
> spark.read.option("header", "true")
> .option("emptyValue", "")
> .option("nullValue", null)
> .csv("dbfs:/tmp/c.csv").collect() {code}
> But PySpark code, doesn't work, prints: {{[Row(id='1', val=None), Row(id='2', 
> val=None), Row(id='3', val='str1')]}}
> {code:java}
> park.read.csv('dbfs:/tmp/c.csv', header=True, emptyValue='', 
> nullValue=None).collect() {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to