[jira] [Assigned] (SPARK-13309) Incorrect type inference for CSV data.

Apache Spark (JIRA) Sat, 13 Feb 2016 11:01:13 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-13309:
------------------------------------

    Assignee: Apache Spark

> Incorrect type inference for CSV data.
> --------------------------------------
>
>                 Key: SPARK-13309
>                 URL: https://issues.apache.org/jira/browse/SPARK-13309
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Rahul Tanwani
>            Assignee: Apache Spark
>             Fix For: 1.6.0
>
>
> Type inference for CSV data does not work as expected when the data is 
> sparse. 
> For instance: Consider the following datasets and the inferred schema:
> A,B,C,D
> 1,,,
> ,1,,
> ,,1,
> ,,,1
> root
> |-- A: integer (nullable = true)
> |-- B: integer (nullable = true)
> |-- C: string (nullable = true)
> |-- D: string (nullable = true)
> Here all the fields should have been inferred as Integer types, but clearly 
> the inferred schema is different.
> Another dataset:
> A,B,C,D
> 1,,1,
> and the inferred schema:
> root
> |-- A: string (nullable = true)
> |-- B: string (nullable = true)
> |-- C: string (nullable = true)
> |-- D: string (nullable = true)
> Here, fields A & C should be inferred as Integer types. 
> Same issue has been discussed on spark-csv package. Please take a look at 
> https://github.com/databricks/spark-csv/issues/216 for reference. 
> The issue was fixed with 
> https://github.com/databricks/spark-csv/commit/8704b26030da88ac6e18b955a81d5c22ca3b480d.
>  I will try to submit PR with the patch soon.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13309) Incorrect type inference for CSV data.

Reply via email to