[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

sureshthalamati Thu, 20 Oct 2016 13:52:40 -0700

Github user sureshthalamati commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12904#discussion_r84373258
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
    @@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val 
parameters: Map[String, Str
       val permissive = ParseModes.isPermissiveMode(parseMode)
     
       val nullValue = parameters.getOrElse("nullValue", "")
    +  val emptyValue = parameters.getOrElse("emptyValue", "")
    --- End diff --
    
    Yes, null and empty can not be differentiated when they are set to same 
value.  Currently null value check has higher precedence than the empty value. 
    
    input.csv
    1,
    2,ââ 
    
    Output will be: 
    1, null
    2, null
    
    
    I think this behavior  is ok.  By default  Univocity CSV parser  used in 
spark  also returns  null for empty strings.
    
    I agree we should document this behavior.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

Reply via email to