Very interesting, the line doing drop table will throws an exception. After removing it all works.
Jianshi On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Here's the solution I got after talking with Liancheng: > > 1) using backquote `..` to wrap up all illegal characters > > val rdd = parquetFile(file) > val schema = rdd.schema.fields.map(f => s"`${f.name}` > ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") > > val ddl_13 = s""" > |CREATE EXTERNAL TABLE $name ( > | $schema > |) > |STORED AS PARQUET > |LOCATION '$file' > """.stripMargin > > sql(ddl_13) > > 2) create a new Schema and do applySchema to generate a new SchemaRDD, had > to drop and register table > > val t = table(name) > val newSchema = StructType(t.schema.fields.map(s => s.copy(name = > s.name.replaceAll(".*?::", "")))) > sql(s"drop table $name") > applySchema(t, newSchema).registerTempTable(name) > > I'm testing it for now. > > Thanks for the help! > > > Jianshi > > On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> Hi, >> >> I had to use Pig for some preprocessing and to generate Parquet files for >> Spark to consume. >> >> However, due to Pig's limitation, the generated schema contains Pig's >> identifier >> >> e.g. >> sorted::id, sorted::cre_ts, ... >> >> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g. >> >> create external table pmt ( >> sorted::id bigint >> ) >> stored as parquet >> location '...' >> >> Obviously it didn't work, I also tried removing the identifier sorted::, >> but the resulting rows contain only nulls. >> >> Any idea how to create a table in HiveContext from these Parquet files? >> >> Thanks, >> Jianshi >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/