Spark 1.3 SQL Type Parser Changes?
In Spark 1.2 I used to be able to do this: scala org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint) res30: org.apache.spark.sql.catalyst.types.DataType = StructType(List(StructField(int,LongType,true))) That is, the name of a column can be a keyword like int. This is no longer the case in 1.3: data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint) org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8] failure: ``'' expected but `int' found structint:bigint ^ at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52) at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785) at org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9) Note HiveTypeHelper is simply an object I load in to expose HiveMetastoreTypes since it was made private. See https://gist.github.com/nitay/460b41ed5fd7608507f5 https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c This is actually a pretty big problem for us as we have a bunch of legacy tables with column names like timestamp. They work fine in 1.2, but now everything throws in 1.3. Any thoughts? Thanks, - Nitay Founder CTO
Re: Spark 1.3 SQL Type Parser Changes?
Thanks for reporting. This was a result of a change to our DDL parser that resulted in types becoming reserved words. I've filled a JIRA and will investigate if this is something we can fix. https://issues.apache.org/jira/browse/SPARK-6250 On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe ni...@actioniq.co wrote: In Spark 1.2 I used to be able to do this: scala org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint) res30: org.apache.spark.sql.catalyst.types.DataType = StructType(List(StructField(int,LongType,true))) That is, the name of a column can be a keyword like int. This is no longer the case in 1.3: data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint) org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8] failure: ``'' expected but `int' found structint:bigint ^ at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52) at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785) at org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9) Note HiveTypeHelper is simply an object I load in to expose HiveMetastoreTypes since it was made private. See https://gist.github.com/nitay/460b41ed5fd7608507f5 https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c This is actually a pretty big problem for us as we have a bunch of legacy tables with column names like timestamp. They work fine in 1.2, but now everything throws in 1.3. Any thoughts? Thanks, - Nitay Founder CTO
Re: Spark 1.3 SQL Type Parser Changes?
Hi Nitay, Can you try using backticks to quote the column name? Like org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType( struct`int`:bigint)? Thanks, Yin On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com wrote: Thanks for reporting. This was a result of a change to our DDL parser that resulted in types becoming reserved words. I've filled a JIRA and will investigate if this is something we can fix. https://issues.apache.org/jira/browse/SPARK-6250 On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe ni...@actioniq.co wrote: In Spark 1.2 I used to be able to do this: scala org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint) res30: org.apache.spark.sql.catalyst.types.DataType = StructType(List(StructField(int,LongType,true))) That is, the name of a column can be a keyword like int. This is no longer the case in 1.3: data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint) org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8] failure: ``'' expected but `int' found structint:bigint ^ at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52) at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785) at org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9) Note HiveTypeHelper is simply an object I load in to expose HiveMetastoreTypes since it was made private. See https://gist.github.com/nitay/460b41ed5fd7608507f5 https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c This is actually a pretty big problem for us as we have a bunch of legacy tables with column names like timestamp. They work fine in 1.2, but now everything throws in 1.3. Any thoughts? Thanks, - Nitay Founder CTO