[ https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270779#comment-14270779 ]
Amareshwari Sriramadasu commented on HIVE-9324: ----------------------------------------------- After doing some code walkthrough, here is what i found, On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to 25000), it spills the values to a file on disk, and spill uses SequenceFile format. Here is the table description for spill (from org.apache.hadoop.hive.ql.exec.JoinUtil.java) {noformat} TableDesc tblDesc = new TableDesc( SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class, Utilities.makeProperties( org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, "" + Utilities.ctrlaCode, org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames .toString(), org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES, colTypes.toString(), serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName())); spillTableDesc[tag] = tblDesc; {noformat} >From the exception: {noformat} Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) ... 13 more {noformat} I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also couldnt figure out the reason why the reading went wrong. Following is the code snippet from SequenceFile.java for the exception we are hitting : {noformat} 2417 public synchronized Object next(Object key) throws IOException { 2418 if (key != null && key.getClass() != getKeyClass()) { 2419 throw new IOException("wrong key class: "+key.getClass().getName() 2420 +" is not "+keyClass); 2421 } 2422 2423 if (!blockCompressed) { 2424 outBuf.reset(); 2425 2426 keyLength = next(outBuf); 2427 if (keyLength < 0) 2428 return null; 2429 2430 valBuffer.reset(outBuf.getData(), outBuf.getLength()); 2431 2432 key = deserializeKey(key); 2433 valBuffer.mark(0); 2434 if (valBuffer.getPosition() != keyLength) 2435 throw new IOException(key + " read " + valBuffer.getPosition() 2436 + " bytes, should read " + keyLength); {noformat} > Reduce side joins failing with IOException from RowContainer.nextBlock > ---------------------------------------------------------------------- > > Key: HIVE-9324 > URL: https://issues.apache.org/jira/browse/HIVE-9324 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.13.1 > Reporter: Amareshwari Sriramadasu > > We are seeing some reduce side join mapreduce jobs failing with following > exception : > {noformat} > 2014-12-14 16:58:51,296 ERROR > org.apache.hadoop.hive.ql.exec.persistence.RowContainer: > org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should > read 27264 > java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 > read 1 bytes, should read 27264 > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > 2014-12-14 16:58:51,334 FATAL ExecReducer: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should > read 27264 > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 > read 1 bytes, should read 27264 > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) > ... 12 more > Caused by: java.io.IOException: > org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should > read 27264 > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > ... 13 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)