Hmm... another issue I found doing this approach is that ANALYZE TABLE ...
COMPUTE STATISTICS will fail to attach the metadata to the table, and later
broadcast join and such will fail...

Any idea how to fix this issue?

Jianshi

On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Very interesting, the line doing drop table will throws an exception.
> After removing it all works.
>
> Jianshi
>
> On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com>
> wrote:
>
>> Here's the solution I got after talking with Liancheng:
>>
>> 1) using backquote `..` to wrap up all illegal characters
>>
>>     val rdd = parquetFile(file)
>>     val schema = rdd.schema.fields.map(f => s"`${f.name}`
>> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
>>
>>     val ddl_13 = s"""
>>       |CREATE EXTERNAL TABLE $name (
>>       |  $schema
>>       |)
>>       |STORED AS PARQUET
>>       |LOCATION '$file'
>>       """.stripMargin
>>
>>     sql(ddl_13)
>>
>> 2) create a new Schema and do applySchema to generate a new SchemaRDD,
>> had to drop and register table
>>
>>     val t = table(name)
>>     val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
>> s.name.replaceAll(".*?::", ""))))
>>     sql(s"drop table $name")
>>     applySchema(t, newSchema).registerTempTable(name)
>>
>> I'm testing it for now.
>>
>> Thanks for the help!
>>
>>
>> Jianshi
>>
>> On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I had to use Pig for some preprocessing and to generate Parquet files
>>> for Spark to consume.
>>>
>>> However, due to Pig's limitation, the generated schema contains Pig's
>>> identifier
>>>
>>> e.g.
>>> sorted::id, sorted::cre_ts, ...
>>>
>>> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
>>>
>>>   create external table pmt (
>>>     sorted::id bigint
>>>   )
>>>   stored as parquet
>>>   location '...'
>>>
>>> Obviously it didn't work, I also tried removing the identifier sorted::,
>>> but the resulting rows contain only nulls.
>>>
>>> Any idea how to create a table in HiveContext from these Parquet files?
>>>
>>> Thanks,
>>> Jianshi
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to