[ 
https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270779#comment-14270779
 ] 

Amareshwari Sriramadasu commented on HIVE-9324:
-----------------------------------------------

After doing some code walkthrough, here is what i found,

On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to 
25000), it spills the values to a file on disk, and spill uses SequenceFile 
format. 

Here is the table description for spill (from 
org.apache.hadoop.hive.ql.exec.JoinUtil.java)
{noformat}
      TableDesc tblDesc = new TableDesc(
          SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class,
          Utilities.makeProperties(
          org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, ""
          + Utilities.ctrlaCode,
          org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames
          .toString(),
          org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES,
          colTypes.toString(),
          serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName()));
      spillTableDesc[tag] = tblDesc;
{noformat}
>From the exception:
{noformat}
Caused by: java.io.IOException: 
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 
27264
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
        at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
        ... 13 more
{noformat}

I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also 
couldnt figure out the reason why the reading went wrong.

Following is the code snippet from SequenceFile.java for the exception we are 
hitting :
{noformat}
2417     public synchronized Object next(Object key) throws IOException {
2418       if (key != null && key.getClass() != getKeyClass()) {
2419         throw new IOException("wrong key class: "+key.getClass().getName()
2420                               +" is not "+keyClass);
2421       }
2422 
2423       if (!blockCompressed) {
2424         outBuf.reset();
2425 
2426         keyLength = next(outBuf);
2427         if (keyLength < 0)
2428           return null;
2429 
2430         valBuffer.reset(outBuf.getData(), outBuf.getLength());
2431 
2432         key = deserializeKey(key);
2433         valBuffer.mark(0);
2434         if (valBuffer.getPosition() != keyLength)
2435           throw new IOException(key + " read " + valBuffer.getPosition()
2436                                 + " bytes, should read " + keyLength);
{noformat}

> Reduce side joins failing with IOException from RowContainer.nextBlock
> ----------------------------------------------------------------------
>
>                 Key: HIVE-9324
>                 URL: https://issues.apache.org/jira/browse/HIVE-9324
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.1
>            Reporter: Amareshwari Sriramadasu
>
> We are seeing some reduce side join mapreduce jobs failing with following 
> exception :
> {noformat}
> 2014-12-14 16:58:51,296 ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should 
> read 27264
> java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 
> read 1 bytes, should read 27264
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:416)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 2014-12-14 16:58:51,334 FATAL ExecReducer: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should 
> read 27264
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:416)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 
> read 1 bytes, should read 27264
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>       ... 12 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should 
> read 27264
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>       ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to