Re: issue while reading parquet file in hive

Sergio Pena Thu, 06 Aug 2015 08:00:49 -0700

I haven't used parquet cascade. But, basically an INT96 type is just a
12-bytes binary. if you can have a int96 type in the schema, then you can
write the 12-bytes using binary methods. Hive tries to match the table
schema (timestamp) with the parquet schema (int96). If both are correct, it
then reads the data as binary.


- Sergio

On Wed, Aug 5, 2015 at 11:04 PM, Santlal J Gupta <
[email protected]> wrote:

> Hi,
>
> Int96 is not supported in the cascading parquet. It supports Int32 and
> Int64. So that's why I have used binary instead of Int96.
>
> Thanks,
> Santlal J. Gupta
>
> -----Original Message-----
> From: Sergio Pena [mailto:[email protected]]
> Sent: Wednesday, August 5, 2015 11:00 PM
> To: [email protected]
> Subject: Re: issue while reading parquet file in hive
>
> Hi Santlal,
>
> Hive uses parquet int96 type to write and read timestamps. Probably the
> error is because of that. You can try with int96 instead of binary.
>
> - Sergio
>
> On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta <
> [email protected]> wrote:
>
> > Hello,
> >
> >
> >
> > I have following issue.
> >
> >
> >
> > I have created parquet file through cascading parquet  and want to
> > load into the hive table.
> >
> > My datafile contain data of type timestamp.
> >
> > Cascading parquet does not  support  timestamp data type , so while
> > creating parquet file I have given as binary type. After generating
> > parquet file , this  Parquet file is loaded successfully in the hive .
> >
> >
> >
> > While creating hive table I have given the column type as timestamp.
> >
> >
> >
> > Code :
> >
> >
> >
> > package com.parquet.TimestampTest;
> >
> >
> >
> > import cascading.flow.FlowDef;
> >
> > import cascading.flow.hadoop.HadoopFlowConnector;
> >
> > import cascading.pipe.Pipe;
> >
> > import cascading.scheme.Scheme;
> >
> > import cascading.scheme.hadoop.TextDelimited;
> >
> > import cascading.tap.SinkMode;
> >
> > import cascading.tap.Tap;
> >
> > import cascading.tap.hadoop.Hfs;
> >
> > import cascading.tuple.Fields;
> >
> > import parquet.cascading.ParquetTupleScheme;
> >
> >
> >
> > public class GenrateTimeStampParquetFile {
> >
> >                 static String inputPath =
> > "target/input/timestampInputFile1";
> >
> >                 static String outputPath =
> > "target/parquetOutput/TimestampOutput";
> >
> >
> >
> >                 public static void main(String[] args) {
> >
> >
> >
> >                                 write();
> >
> >                 }
> >
> >
> >
> >                 private static void write() {
> >
> >                                 // TODO Auto-generated method stub
> >
> >
> >
> >                                 Fields field = new
> > Fields("timestampField").applyTypes(String.class);
> >
> >                                 Scheme sourceSch = new
> > TextDelimited(field, false, "\n");
> >
> >
> >
> >                                 Fields outputField = new
> > Fields("timestampField");
> >
> >
> >
> >                                 Scheme sinkSch = new
> > ParquetTupleScheme(field, outputField,
> >
> >
> > "message TimeStampTest{optional binary timestampField ;}");
> >
> >
> >
> >                                 Tap source = new Hfs(sourceSch,
> > inputPath);
> >
> >                                 Tap sink = new Hfs(sinkSch,
> > outputPath, SinkMode.REPLACE);
> >
> >
> >
> >                                 Pipe pipe = new Pipe("Hive
> > timestamp");
> >
> >
> >
> >                                 FlowDef fd =
> > FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink);
> >
> >
> >
> >                                 new
> > HadoopFlowConnector().connect(fd).complete();
> >
> >                 }
> >
> > }
> >
> >
> >
> > Input file:
> >
> >
> >
> > timestampInputFile1
> >
> >
> >
> > timestampField
> >
> > 1988-05-25 15:15:15.254
> >
> > 1987-05-06 14:14:25.362
> >
> >
> >
> > After running the code following files are generated.
> >
> > Output :
> >
> > 1. part-00000-m-00000.parquet
> >
> > 2. _SUCCESS
> >
> > 3. _metadata
> >
> > 4. _common_metadata
> >
> >
> >
> > I have created the table in hive to load the
> > part-00000-m-00000.parquet file.
> >
> >
> >
> > I have written following query in the hive.
> >
> > Query :
> >
> >
> >
> > hive> create table test3(timestampField timestamp) stored as parquet;
> >
> > hive> load data local inpath
> > '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table
> > test3;
> >
> > hive> select  * from test3;
> >
> >
> >
> > After running above command I got following as output.
> >
> >
> >
> > Output :
> >
> >
> >
> > OK
> >
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> >
> > SLF4J: Defaulting to no-operation (NOP) logger implementation
> >
> > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> > further details.
> >
> > Failed with exception
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable
> > cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
> >
> >
> >
> >
> >
> > But I have got above exception.
> >
> >
> >
> > So please help me to solve this problem.
> >
> >
> >
> > Currently I am using
> >
> >     Hive 1.1.0-cdh5.4.2.
> >
> >    Cascading 2.5.1
> >
> >    parquet-format-2.2.0
> >
> >
> >
> > Thanks
> >
> > Santlal J. Gupta
> >
> >
> >
> >
> >
> > **************************************Disclaimer**********************
> > ******************** This e-mail message and any attachments may
> > contain confidential information and is for the sole use of the
> > intended recipient(s) only. Any views or opinions presented or implied
> > are solely those of the author and do not necessarily represent the
> > views of BitWise. If you are not the intended recipient(s), you are
> > hereby notified that disclosure, printing, copying, forwarding,
> > distribution, or the taking of any action whatsoever in reliance on
> > the contents of this electronic information is strictly prohibited. If
> > you have received this e-mail message in error, please immediately
> > notify the sender and delete the electronic message and any
> > attachments.BitWise does not accept liability for any virus introduced
> > by this e-mail or any attachments.
> > **********************************************************************
> > **********************
> >
>

Re: issue while reading parquet file in hive

Reply via email to