[jira] [Assigned] (SPARK-15458) Disable schema inference for streaming datasets on file streams

Apache Spark (JIRA) Fri, 20 May 2016 18:33:30 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-15458:
------------------------------------

    Assignee: Tathagata Das  (was: Apache Spark)

> Disable schema inference for streaming datasets on file streams
> ---------------------------------------------------------------
>
>                 Key: SPARK-15458
>                 URL: https://issues.apache.org/jira/browse/SPARK-15458
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>
> If the user relies on the schema to be inferred in file streams can break 
> easily for multiple reasons
> - accidentally running on a directory which has no data
> - schema changing underneath
> - on restart, the query will infer schema again, and may unexpectedly infer 
> incorrect schema, as the file in the directory may be different at the time 
> of the restart.
> To avoid these complicated scenarios, for Spark 2.0, we are going to disable 
> schema inferencing by default with a config, so that user is forced to 
> consider explicitly what is the schema it wants, rather than the system 
> trying to infer it and run into weird corner cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15458) Disable schema inference for streaming datasets on file streams

Reply via email to