[ https://issues.apache.org/jira/browse/SPARK-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884876#comment-16884876 ]
Hyukjin Kwon commented on SPARK-28338: -------------------------------------- It's Spark's beavhiour. To make sure the data consistency, quote empty strings to read it as strings. > spark.read.format("csv") treat empty string as null if csv file don't have > quotes in data > ----------------------------------------------------------------------------------------- > > Key: SPARK-28338 > URL: https://issues.apache.org/jira/browse/SPARK-28338 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.3 > Reporter: Jayadevan M > Priority: Major > > The csv input file > +cat sample.csv+ > {code} > Name,Lastname,Age > abc,,32 > pqr,xxx,30 > {code} > +spark-shell+ > {code} > spark.read.format("csv").option("header", > "true").load("/media/ub_share/projects/*.csv").head(3) > spark.read.format("csv").option("header", "true").option("nullValue", > "?").load("/media/ub_share/projects/*.csv").head(3) > {code} > {code} > res15: Array[org.apache.spark.sql.Row] = Array([abc,null,32], [pqr,xxx,30]) > {code} > > The empty string get converted to null. Its works fine if the csv file have > quotes in columns. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org