On 12/04/2014 11:53 AM, Yan Qi wrote:
Hi Ryan,

When I set both read schema and request schema to be the one with 4 fields
only (i.e., a subset of the file schema, Profile.getClassSchema()), I had
the following error though,

14/12/04 11:48:01 INFO mapred.JobClient: Task Id :
attempt_201410141621_22583_m_000000_1, Status : FAILED
parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in
file hdfs://had.ca:9000/tmp/avro/2014_10_14/part-00000
         at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
         at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
         at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
         at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
         at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.Cl
attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO:
parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will
read a total of 1000001 records.
attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO:
parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
attempt_201410141621_22583_m_000000_2: Dec 4, 2014 11:47:56 AM INFO:
parquet.hadoop.InternalParquetRecordReader: block read in memory in 338 ms.
row count = 603147
attempt_201410141621_22583_m_000000_2: SLF4J: Failed to load class
"org.slf4j.impl.StaticLoggerBinder".
attempt_201410141621_22583_m_000000_2: SLF4J: Defaulting to no-operation
(NOP) logger implementation
attempt_201410141621_22583_m_000000_2: SLF4J: See
http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

I am wondering if I set the schema correctly. Can you give me some
suggestions?

Thanks,
Yan

Can you send the full log from the task that failed? It looks like it was cut off because you only get the first part in the `hadoop` command output.

Without all the information, I'm gussing that "java.lang.Cl" is a ClassCastException. That would happen if your read schema doesn't have the necessary java-class properties that cause Avro to instantiate your specific object rather than a GenericData.Record object.

I recommend taking the schema you are using for the read schema and building a Specific object for it. Then you can use that stripped-down specific object as you were before (call it PartialProfile or something) to avoid this issue.


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to