Hi Stephen,

I created ticket: https://issues.apache.org/jira/browse/PARQUET-406 to
track your issue. We'll take a look to track down your issue and then get
back to you.

Thanks, and let us know if you have any other questions.
Reuben

On Mon, Dec 14, 2015 at 12:22 PM, Stephen Bly <b...@foursquare.com> wrote:

> Greetings Parquet developers. I am trying to create my own custom
> InputFormat for reading Parquet tables in Hive. This is how I create the
> table:
>
> CREATE EXTERNAL TABLE api_hit_parquet_test ROW FORMAT SERDE
> 'com.foursquare.hadoop.hive.serde.RecordV2SerDe' WITH SERDEPROPERTIES
> ('serialization.class' = 'com.foursquare.logs.gen.ApiHit') STORED AS
> INPUTFORMAT 'com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION '/user/bly/api_hit_parquet' TBLPROPERTIES
> ('thrift.parquetfile.input.format.thrift.class' =
> 'com.foursquare.logs.gen.ApiHit’)
>
> The table is successfully created, and I can verify the schema is correct
> by running DESCRIBE FORMATTED on it. However, when I try to do a simple
> SELECT * on the table, I get the following stack trace:
>
> java.io.IOException: java.lang.RuntimeException: Could not read first
> record (and it was not an EOF)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
>         at
> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>         at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> Caused by: java.lang.RuntimeException: Could not read first record (and it
> was not an EOF)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:280)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.createValue(DeprecatedInputFormatWrapper.java:297)
>         at
> com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat$$anon$1.<init>(HiveThriftParquetInputFormat.scala:47)
>         at
> com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat.getRecordReader(HiveThriftParquetInputFormat.scala:46)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
>         ... 9 more
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read
> value at 0 in block -1 in file
> hdfs://hadoop-alidoro-nn-vip/user/bly/api_hit_parquet/part-m-00000.parquet
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>         at
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:271)
>         ... 15 more
> Caused by: java.lang.NullPointerException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at
> org.apache.parquet.hadoop.util.ContextUtil.invoke(ContextUtil.java:264)
>         at
> org.apache.parquet.hadoop.util.ContextUtil.incrementCounter(ContextUtil.java:273)
>         at
> org.apache.parquet.hadoop.util.counters.mapreduce.MapReduceCounterAdapter.increment(MapReduceCounterAdapter.java:38)
>         at
> org.apache.parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:78)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:497)
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>         ... 17 more
>
> I have spent some time following this stack trace, and it appears that the
> error lies in the Counter code, which is odd because I don’t do anything
> with that. Is there some way I need to initialize counters?
>
> To be specific, I have found that MapReduceCounterAdapter is being created
> with a null parameter. Here is the constructor:
>
> public MapReduceCounterAdapter(Counter adaptee) {
>     this.adaptee = adaptee;
>   }
>
> So adaptee is being passed as null, and then getting called later on,
> causing my NullPointerException.
>
> The adaptee parameter is created by this method:
>
> public static Counter getCounter(TaskInputOutputContext context,
>                                    String groupName, String counterName) {
>     return (Counter) invoke(GET_COUNTER_METHOD, context, groupName,
> counterName);
>   }
>
> I am really quite stuck. Has anyone else has problems with this? Is there
> some code I need to add to get counters to work properly?

Reply via email to