[jira] [Commented] (SQOOP-1393) Import data from database to Hive as Parquet files

Pratik Khadloya (JIRA) Fri, 12 Sep 2014 17:30:06 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132376#comment-14132376
 ]


Pratik Khadloya commented on SQOOP-1393:
----------------------------------------

I got past the metadata error by running the command using hcatalog.

{code}
bin/sqoop import -jt myjt:xxxx --connect jdbc:mysql://mydbserver.net/mydb 
--username myuser --password mypwd --query "SELECT ... WHERE \$CONDITIONS" 
--num-mappers 1 --hcatalog-storage-stanza "STORED AS PARQUET" 
--create-hcatalog-table --hcatalog-table abc2
{code}

But, since i am using hive 0.13, i get the following error which states that 
one should not use MapredParquetOutputFormat with hive 0.13 as it has native 
support for PARQUET files.
{code}
14/09/12 20:16:55 INFO mapred.JobClient: Task Id : 
attempt_201409022012_0543_m_000000_2, Status : FAILED
java.lang.RuntimeException: Should never be used
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:77)
        at 
org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:103)
        at 
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:548)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:653)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
{code}

Is there any code change planned for supporting hive 0.13 ?

> Import data from database to Hive as Parquet files
> --------------------------------------------------
>
>                 Key: SQOOP-1393
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1393
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: tools
>            Reporter: Qian Xu
>            Assignee: Richard
>             Fix For: 1.4.6
>
>         Attachments: patch.diff, patch_v2.diff, patch_v3.diff
>
>
> Import data to Hive as Parquet file can be separated into two steps:
> 1. Import an individual table from an RDBMS to HDFS as a set of Parquet files.
> 2. Import the data into Hive by generating and executing a CREATE TABLE 
> statement to define the data's layout in Hive with Parquet format table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1393) Import data from database to Hive as Parquet files

Reply via email to