Hello,

I have following issue.

I have created parquet file through cascading parquet  and want to load into 
the hive table. Parquet file is loaded successfully but when I try to read the 
file it  gives null instead of actual data. Please find the below code .

package com.parquet.TimestampTest;

import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.scheme.Scheme;
import cascading.scheme.hadoop.TextDelimited;
import cascading.tap.SinkMode;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;
import cascading.tuple.Fields;
import parquet.cascading.ParquetTupleScheme;

public class GenrateTimeStampParquetFile {
     static String inputPath = "target/input/timestampInputFile";
     static String outputPath = "target/parquetOutput/TimestampOutput";

     public static void main(String[] args) {

           write();
     }

     private static void write() {
           // TODO Auto-generated method stub

           Fields field = new Fields("timestampField").applyTypes(String.class);
           Scheme sourceSch = new TextDelimited(field, true, "\n");

           Fields outputField = new Fields("timestampField");

           Scheme sinkSch = new ParquetTupleScheme(field, outputField,
                     "message TimeStampTest{optional binary timestampField ;}");

           Tap source = new Hfs(sourceSch, inputPath);
           Tap sink = new Hfs(sinkSch, outputPath, SinkMode.REPLACE);

           Pipe pipe = new Pipe("Hive timestamp");

           FlowDef fd = FlowDef.flowDef().addSource(pipe, 
source).addTailSink(pipe, sink);

           new HadoopFlowConnector().connect(fd).complete();
     }
}

Input file:

timestampInputFile

timestampField
1988-05-25 15:15:15.254
1987-05-06 14:14:25.362

After running the code following files are generated.
Output :
1. part-00000-m-00000.parquet
2. _SUCCESS
3. _metadata
4. _common_metadata

I have created the table in hive to load the part-00000-m-00000.parquet  file.
File is loaded is successfully but it gives null value while reading.

I have used following command.

hive> create table timestampTest (timestampField timestamp);

hive> load data local inpath 
'/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table 
timestampTest;
Loading data to table parquet_timestamp_test.timestamptest
Table parquet_timestamp_test.timestamptest stats: [numFiles=1, totalSize=296]
OK
Time taken: 0.508 seconds

hive> select * from timestamptest;
OK
NULL
NULL
NULL
Time taken: 0.104 seconds, Fetched: 3 row(s)

**************************************Disclaimer******************************************
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments. 
********************************************************************************************

Reply via email to