[
https://issues.apache.org/jira/browse/PARQUET-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258692#comment-14258692
]
Dmitriy commented on PARQUET-155:
---------------------------------
>>> What you could try is to write a simple map-only job that reads with Avro
>>> and writes with Parquet-avro. Then you wouldn't have to do any conversion
If I'll do this, then I need to define parquet backed table by myself, am I
right? This is not very good approach for me, because my real schema is really
big. Or there some other way to create hive table using existing parquet files?
I searched for this, but only two approaches I've found "create as select" and
define table when create it.
> Hive Avro to Parquet table conversion
> -------------------------------------
>
> Key: PARQUET-155
> URL: https://issues.apache.org/jira/browse/PARQUET-155
> Project: Parquet
> Issue Type: Bug
> Reporter: Dmitriy
>
> Hi.
> I have following avro schema
> {code}
> {
> "namespace" : "com.example.test",
> "type" : "record",
> "name" : "TestRecord",
> "fields" : [{"name" : "objectLink", "type" : [
> {"type": "record", "name" : "TestObj1", "fields" :
> [{"name":"obj1VisitorId","type":["null","string"]}] },
> {"type": "record", "name" : "TestObj2", "fields" :
> [{"name":"obj2VisitorId","type":["null","string"]}]}
> ]
> }],
> "doc" : "event for test purposes"
> }
> {code}
> Using this schema I can create avro objects, also I'm able to create table
> backed by avro in Hive. But then I want to create a table backed by parquet
> I'm doing
> CREATE TABLE parquet_table
> STORED AS parquet
> AS SELECT * FROM avro_table
> and i get
> SemanticException java.lang.UnsupportedOperationException: Unknown field
> type: uniontype<struct<obj1visitorid:string>,struct<obj2visitorid:string>>
> Is there a way to convert such structures, to store them in hive backed as
> parquet? This is a simple example, but I have big data structure described in
> avro, so I can't convert it manually, and also I have data which already
> stored in avro and need to be loaded in table, backed by parquet. Is there
> any way to this?
> I'm using hive 0.13.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)