I haven't used parquet cascade. But, basically an INT96 type is just a 12-bytes binary. if you can have a int96 type in the schema, then you can write the 12-bytes using binary methods. Hive tries to match the table schema (timestamp) with the parquet schema (int96). If both are correct, it then reads the data as binary.
- Sergio On Wed, Aug 5, 2015 at 11:04 PM, Santlal J Gupta < santlal.gu...@bitwiseglobal.com> wrote: > Hi, > > Int96 is not supported in the cascading parquet. It supports Int32 and > Int64. So that's why I have used binary instead of Int96. > > Thanks, > Santlal J. Gupta > > -----Original Message----- > From: Sergio Pena [mailto:sergio.p...@cloudera.com] > Sent: Wednesday, August 5, 2015 11:00 PM > To: dev@hive.apache.org > Subject: Re: issue while reading parquet file in hive > > Hi Santlal, > > Hive uses parquet int96 type to write and read timestamps. Probably the > error is because of that. You can try with int96 instead of binary. > > - Sergio > > On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta < > santlal.gu...@bitwiseglobal.com> wrote: > > > Hello, > > > > > > > > I have following issue. > > > > > > > > I have created parquet file through cascading parquet and want to > > load into the hive table. > > > > My datafile contain data of type timestamp. > > > > Cascading parquet does not support timestamp data type , so while > > creating parquet file I have given as binary type. After generating > > parquet file , this Parquet file is loaded successfully in the hive . > > > > > > > > While creating hive table I have given the column type as timestamp. > > > > > > > > Code : > > > > > > > > package com.parquet.TimestampTest; > > > > > > > > import cascading.flow.FlowDef; > > > > import cascading.flow.hadoop.HadoopFlowConnector; > > > > import cascading.pipe.Pipe; > > > > import cascading.scheme.Scheme; > > > > import cascading.scheme.hadoop.TextDelimited; > > > > import cascading.tap.SinkMode; > > > > import cascading.tap.Tap; > > > > import cascading.tap.hadoop.Hfs; > > > > import cascading.tuple.Fields; > > > > import parquet.cascading.ParquetTupleScheme; > > > > > > > > public class GenrateTimeStampParquetFile { > > > > static String inputPath = > > "target/input/timestampInputFile1"; > > > > static String outputPath = > > "target/parquetOutput/TimestampOutput"; > > > > > > > > public static void main(String[] args) { > > > > > > > > write(); > > > > } > > > > > > > > private static void write() { > > > > // TODO Auto-generated method stub > > > > > > > > Fields field = new > > Fields("timestampField").applyTypes(String.class); > > > > Scheme sourceSch = new > > TextDelimited(field, false, "\n"); > > > > > > > > Fields outputField = new > > Fields("timestampField"); > > > > > > > > Scheme sinkSch = new > > ParquetTupleScheme(field, outputField, > > > > > > "message TimeStampTest{optional binary timestampField ;}"); > > > > > > > > Tap source = new Hfs(sourceSch, > > inputPath); > > > > Tap sink = new Hfs(sinkSch, > > outputPath, SinkMode.REPLACE); > > > > > > > > Pipe pipe = new Pipe("Hive > > timestamp"); > > > > > > > > FlowDef fd = > > FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink); > > > > > > > > new > > HadoopFlowConnector().connect(fd).complete(); > > > > } > > > > } > > > > > > > > Input file: > > > > > > > > timestampInputFile1 > > > > > > > > timestampField > > > > 1988-05-25 15:15:15.254 > > > > 1987-05-06 14:14:25.362 > > > > > > > > After running the code following files are generated. > > > > Output : > > > > 1. part-00000-m-00000.parquet > > > > 2. _SUCCESS > > > > 3. _metadata > > > > 4. _common_metadata > > > > > > > > I have created the table in hive to load the > > part-00000-m-00000.parquet file. > > > > > > > > I have written following query in the hive. > > > > Query : > > > > > > > > hive> create table test3(timestampField timestamp) stored as parquet; > > > > hive> load data local inpath > > '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table > > test3; > > > > hive> select * from test3; > > > > > > > > After running above command I got following as output. > > > > > > > > Output : > > > > > > > > OK > > > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > > > SLF4J: Defaulting to no-operation (NOP) logger implementation > > > > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for > > further details. > > > > Failed with exception > > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable > > cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable > > > > > > > > > > > > But I have got above exception. > > > > > > > > So please help me to solve this problem. > > > > > > > > Currently I am using > > > > Hive 1.1.0-cdh5.4.2. > > > > Cascading 2.5.1 > > > > parquet-format-2.2.0 > > > > > > > > Thanks > > > > Santlal J. Gupta > > > > > > > > > > > > **************************************Disclaimer********************** > > ******************** This e-mail message and any attachments may > > contain confidential information and is for the sole use of the > > intended recipient(s) only. Any views or opinions presented or implied > > are solely those of the author and do not necessarily represent the > > views of BitWise. If you are not the intended recipient(s), you are > > hereby notified that disclosure, printing, copying, forwarding, > > distribution, or the taking of any action whatsoever in reliance on > > the contents of this electronic information is strictly prohibited. If > > you have received this e-mail message in error, please immediately > > notify the sender and delete the electronic message and any > > attachments.BitWise does not accept liability for any virus introduced > > by this e-mail or any attachments. > > ********************************************************************** > > ********************** > > >