[jira] [Updated] (SPARK-39012) SparkSQL infer schema does not support all data types

Rui Wang (Jira) Mon, 25 Apr 2022 14:41:06 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rui Wang updated SPARK-39012:
-----------------------------
    Description: 
When Spark needs to infer schema, it needs to parse string to a type. Not all 
data types are supported so far in this path. For example, binary is known to 
not be supported. If a user uses binary column, and if the user does not use a 
metastore, then SparkSQL could fall back to schema inference thus fail to 
execute during table scan. This should be a bug as schema inference is 
supported but some types are missing.

string might be converted to all types except ARRAY, MAP, STRUCT, etc. Also 
because when converting from a string, small scale type won't be identified if 
there is a larger scale type. For example, short and long 

Based on Spark SQL data types: 
https://spark.apache.org/docs/latest/sql-ref-datatypes.html, we can support the 
following types:

BINARY
BOOLEAN

And there are two types that I am not sure if SparkSQL is supporting:
YearMonthIntervalType
DayTimeIntervalType


  was:
When Spark needs to infer schema, it needs to parse string to a type. Not all 
data types are supported so far in this path. For example, binary is known to 
not be supported. 

string might be converted to all types except ARRAY, MAP, STRUCT, etc. Also 
because when converting from a string, small scale type won't be identified if 
there is a larger scale type. For example, short and long 

Based on Spark SQL data types: 
https://spark.apache.org/docs/latest/sql-ref-datatypes.html, we can support the 
following types:

BINARY
BOOLEAN

And there are two types that I am not sure if SparkSQL is supporting:
YearMonthIntervalType
DayTimeIntervalType



> SparkSQL infer schema does not support all data types
> -----------------------------------------------------
>
>                 Key: SPARK-39012
>                 URL: https://issues.apache.org/jira/browse/SPARK-39012
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Rui Wang
>            Priority: Major
>
> When Spark needs to infer schema, it needs to parse string to a type. Not all 
> data types are supported so far in this path. For example, binary is known to 
> not be supported. If a user uses binary column, and if the user does not use 
> a metastore, then SparkSQL could fall back to schema inference thus fail to 
> execute during table scan. This should be a bug as schema inference is 
> supported but some types are missing.
> string might be converted to all types except ARRAY, MAP, STRUCT, etc. Also 
> because when converting from a string, small scale type won't be identified 
> if there is a larger scale type. For example, short and long 
> Based on Spark SQL data types: 
> https://spark.apache.org/docs/latest/sql-ref-datatypes.html, we can support 
> the following types:
> BINARY
> BOOLEAN
> And there are two types that I am not sure if SparkSQL is supporting:
> YearMonthIntervalType
> DayTimeIntervalType



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39012) SparkSQL infer schema does not support all data types

Reply via email to