In Spark 1.2 I used to be able to do this:
scala>
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
res30: org.apache.spark.sql.catalyst.types.DataType =
StructType(List(StructField(int,LongType,true)))
That is, the name of a column can be a keyword like "int". This is no
longer the case in 1.3:
data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
failure: ``>'' expected but `int' found
struct<int:bigint>
^
at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
at
org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
at
org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)
Note HiveTypeHelper is simply an object I load in to expose
HiveMetastoreTypes since it was made private. See
https://gist.github.com/nitay/460b41ed5fd7608507f5
<https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>
This is actually a pretty big problem for us as we have a bunch of legacy
tables with column names like "timestamp". They work fine in 1.2, but now
everything throws in 1.3.
Any thoughts?
Thanks,
- Nitay
Founder & CTO