[
https://issues.apache.org/jira/browse/SPARK-53052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011540#comment-18011540
]
Vindhya G edited comment on SPARK-53052 at 8/2/25 5:46 AM:
-----------------------------------------------------------
[Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
expected the {{{}Row(id='2', val='') }}(instead of \{{{}val='emptyStr'{}}}).
I think this is behaving as expected as you are providing what emptyValue you
define. Here you defined 'emptyStr' as your emptyValue.
As for the actual emptyValue with '' I do see the difference in behaviour in
scala and python!
was (Author: JIRAUSER299405):
[Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
expected the {{{}Row(id='2', val='') }}(instead of \{{{}val='emptyStr'{}}}).
I think this is behaving as expected as you are providing what emptyValue you
define. Here you defined 'emptyStr' as your emptyValue.
As for the actual emptyValue with '' I am seeing same behaviour in both scala
and python where it is printed as null!
> emptyValue option does not seem to work from pyspark
> ----------------------------------------------------
>
> Key: SPARK-53052
> URL: https://issues.apache.org/jira/browse/SPARK-53052
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.5.0
> Environment: Spark 3.5.0
> Reporter: Kash Bhatt
> Priority: Minor
>
> According to
> [docs|https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option]
> of csv options:
> {quote}
> ||Property Name||Default||Meaning||
> |emptyValue|(for reading), "" (for writing)|Sets the string representation of
> an empty value.|
> {quote}
> But it doesn't seem to work:
> {code:java}
> with open("/dbfs/tmp/c.csv", "w") as f:
> f.write('''id,val
> 1,
> 2,emptyStr
> 3,str1
> ''')
> spark.read.csv('dbfs:/tmp/c.csv', header=True,
> emptyValue='emptyStr').collect() {code}
> prints:
> [Row(id='1', val=None), Row(id='2', val='emptyStr'), Row(id='3', val='str1')]
> expected the {{{}Row(id='2', val='') (instead of val='emptyStr'{}}}).
> ----
> Not sure if it's related but although docs don't mention any relation to
> {{nullValue}} option, it seems to affect {{{}emptyValue{}}}.
>
> With this content in csv file:
> {code:java}
> id,val
> 1,
> 2,""
> 3,str1 {code}
> Following Scala code works as expected, i.e. prints
> {{Array[org.apache.spark.sql.Row] = Array([1,null], [2,], [3,str1])}}
> {code:java}
> spark.read.option("header", "true")
> .option("emptyValue", "")
> .option("nullValue", null)
> .csv("dbfs:/tmp/c.csv").collect() {code}
> But PySpark code, doesn't work, prints: {{[Row(id='1', val=None), Row(id='2',
> val=None), Row(id='3', val='str1')]}}
> {code:java}
> park.read.csv('dbfs:/tmp/c.csv', header=True, emptyValue='',
> nullValue=None).collect() {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]