Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

Jianshi Huang Sat, 06 Dec 2014 05:13:31 -0800

Very interesting, the line doing drop table will throws an exception. After
removing it all works.


Jianshi

On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Here's the solution I got after talking with Liancheng:
>
> 1) using backquote `..` to wrap up all illegal characters
>
>     val rdd = parquetFile(file)
>     val schema = rdd.schema.fields.map(f => s"`${f.name}`
> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
>
>     val ddl_13 = s"""
>       |CREATE EXTERNAL TABLE $name (
>       |  $schema
>       |)
>       |STORED AS PARQUET
>       |LOCATION '$file'
>       """.stripMargin
>
>     sql(ddl_13)
>
> 2) create a new Schema and do applySchema to generate a new SchemaRDD, had
> to drop and register table
>
>     val t = table(name)
>     val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
> s.name.replaceAll(".*?::", ""))))
>     sql(s"drop table $name")
>     applySchema(t, newSchema).registerTempTable(name)
>
> I'm testing it for now.
>
> Thanks for the help!
>
>
> Jianshi
>
> On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I had to use Pig for some preprocessing and to generate Parquet files for
>> Spark to consume.
>>
>> However, due to Pig's limitation, the generated schema contains Pig's
>> identifier
>>
>> e.g.
>> sorted::id, sorted::cre_ts, ...
>>
>> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
>>
>>   create external table pmt (
>>     sorted::id bigint
>>   )
>>   stored as parquet
>>   location '...'
>>
>> Obviously it didn't work, I also tried removing the identifier sorted::,
>> but the resulting rows contain only nulls.
>>
>> Any idea how to create a table in HiveContext from these Parquet files?
>>
>> Thanks,
>> Jianshi
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

Reply via email to