Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Nitay Joffe
In Spark 1.2 I used to be able to do this:

scala
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint)
res30: org.apache.spark.sql.catalyst.types.DataType =
StructType(List(StructField(int,LongType,true)))

That is, the name of a column can be a keyword like int. This is no
longer the case in 1.3:

data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint)
org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
failure: ``'' expected but `int' found

structint:bigint
   ^
at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
at
org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
at
org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)

Note HiveTypeHelper is simply an object I load in to expose
HiveMetastoreTypes since it was made private. See
https://gist.github.com/nitay/460b41ed5fd7608507f5
https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c

This is actually a pretty big problem for us as we have a bunch of legacy
tables with column names like timestamp. They work fine in 1.2, but now
everything throws in 1.3.

Any thoughts?

Thanks,
- Nitay
Founder  CTO


Re: Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Michael Armbrust
Thanks for reporting.  This was a result of a change to our DDL parser that
resulted in types becoming reserved words.  I've filled a JIRA and will
investigate if this is something we can fix.
https://issues.apache.org/jira/browse/SPARK-6250

On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe ni...@actioniq.co wrote:

 In Spark 1.2 I used to be able to do this:

 scala
 org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint)
 res30: org.apache.spark.sql.catalyst.types.DataType =
 StructType(List(StructField(int,LongType,true)))

 That is, the name of a column can be a keyword like int. This is no
 longer the case in 1.3:

 data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint)
 org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
 failure: ``'' expected but `int' found

 structint:bigint
^
 at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
 at
 org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
 at
 org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)

 Note HiveTypeHelper is simply an object I load in to expose
 HiveMetastoreTypes since it was made private. See
 https://gist.github.com/nitay/460b41ed5fd7608507f5
 https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c

 This is actually a pretty big problem for us as we have a bunch of legacy
 tables with column names like timestamp. They work fine in 1.2, but now
 everything throws in 1.3.

 Any thoughts?

 Thanks,
 - Nitay
 Founder  CTO




Re: Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Yin Huai
Hi Nitay,

Can you try using backticks to quote the column name? Like
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(
struct`int`:bigint)?

Thanks,

Yin

On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com
wrote:

 Thanks for reporting.  This was a result of a change to our DDL parser
 that resulted in types becoming reserved words.  I've filled a JIRA and
 will investigate if this is something we can fix.
 https://issues.apache.org/jira/browse/SPARK-6250

 On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe ni...@actioniq.co wrote:

 In Spark 1.2 I used to be able to do this:

 scala
 org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint)
 res30: org.apache.spark.sql.catalyst.types.DataType =
 StructType(List(StructField(int,LongType,true)))

 That is, the name of a column can be a keyword like int. This is no
 longer the case in 1.3:

 data-pipeline-shell HiveTypeHelper.toDataType(structint:bigint)
 org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
 failure: ``'' expected but `int' found

 structint:bigint
^
 at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
 at
 org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
 at
 org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)

 Note HiveTypeHelper is simply an object I load in to expose
 HiveMetastoreTypes since it was made private. See
 https://gist.github.com/nitay/460b41ed5fd7608507f5
 https://app.relateiq.com/r?c=chrome_gmailurl=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c

 This is actually a pretty big problem for us as we have a bunch of legacy
 tables with column names like timestamp. They work fine in 1.2, but now
 everything throws in 1.3.

 Any thoughts?

 Thanks,
 - Nitay
 Founder  CTO