[ 
https://issues.apache.org/jira/browse/SPARK-20493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20493.
----------------------------------
          Resolution: Fixed
            Assignee: Hyukjin Kwon
       Fix Version/s: 2.3.0
    Target Version/s: 2.3.0

> De-deuplicate parse logics for DDL-like type string in R
> --------------------------------------------------------
>
>                 Key: SPARK-20493
>                 URL: https://issues.apache.org/jira/browse/SPARK-20493
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>             Fix For: 2.3.0
>
>
> It seems we are using SQLUtils.getSQLDataType[1] for type string in 
> structField.
> It looks we can replace this with CatalystSqlParser.parseDataType[2].
> They look similar DDL-like type definitions as below:
> {code}
> scala> Seq(Tuple1(Tuple1("a"))).toDF.show()
> +---+
> | _1|
> +---+
> |[a]|
> +---+
> {code}
> {code}
> scala> 
> Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show()
> +---+
> | _1|
> +---+
> |[a]|
> +---+
> {code}
> Such type strings looks identical when R’s one as below:
> {code}
> > write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", 
> > "parquet")
> > collect(read.df("/tmp/aa", "parquet", structType(structField("struct", 
> > "struct<_1:string>"))))
>   struct
> 1      a
> {code}
> It seems R’s one is more stricter because we are checking the types via 
> regular expressions[3] in R side.
> Actual logics there look a bit different but as we check it ahead in R side, 
> it looks replacing it would not introduce no behaviour changes.
> To make this sure, the tests dedicated for it was added in SPARK-20105.
> [1] 
> https://github.com/apache/spark/blob/d1f6c64c4b763c05d6d79ae5497f298dc3835f3e/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L93-L131
> [2] 
> https://github.com/apache/spark/blob/1472cac4bb31c1886f82830778d34c4dd9030d7a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L36-L40
> [3] 
> https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to