[ https://issues.apache.org/jira/browse/SPARK-20493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Cheung resolved SPARK-20493. ---------------------------------- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > De-deuplicate parse logics for DDL-like type string in R > -------------------------------------------------------- > > Key: SPARK-20493 > URL: https://issues.apache.org/jira/browse/SPARK-20493 > Project: Spark > Issue Type: Improvement > Components: SparkR > Affects Versions: 2.2.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Fix For: 2.3.0 > > > It seems we are using SQLUtils.getSQLDataType[1] for type string in > structField. > It looks we can replace this with CatalystSqlParser.parseDataType[2]. > They look similar DDL-like type definitions as below: > {code} > scala> Seq(Tuple1(Tuple1("a"))).toDF.show() > +---+ > | _1| > +---+ > |[a]| > +---+ > {code} > {code} > scala> > Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show() > +---+ > | _1| > +---+ > |[a]| > +---+ > {code} > Such type strings looks identical when R’s one as below: > {code} > > write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", > > "parquet") > > collect(read.df("/tmp/aa", "parquet", structType(structField("struct", > > "struct<_1:string>")))) > struct > 1 a > {code} > It seems R’s one is more stricter because we are checking the types via > regular expressions[3] in R side. > Actual logics there look a bit different but as we check it ahead in R side, > it looks replacing it would not introduce no behaviour changes. > To make this sure, the tests dedicated for it was added in SPARK-20105. > [1] > https://github.com/apache/spark/blob/d1f6c64c4b763c05d6d79ae5497f298dc3835f3e/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L93-L131 > [2] > https://github.com/apache/spark/blob/1472cac4bb31c1886f82830778d34c4dd9030d7a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L36-L40 > [3] > https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/schema.R#L129-L187 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org