[
https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270779#comment-14270779
]
Amareshwari Sriramadasu commented on HIVE-9324:
-----------------------------------------------
After doing some code walkthrough, here is what i found,
On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to
25000), it spills the values to a file on disk, and spill uses SequenceFile
format.
Here is the table description for spill (from
org.apache.hadoop.hive.ql.exec.JoinUtil.java)
{noformat}
TableDesc tblDesc = new TableDesc(
SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class,
Utilities.makeProperties(
org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, ""
+ Utilities.ctrlaCode,
org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames
.toString(),
org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES,
colTypes.toString(),
serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName()));
spillTableDesc[tag] = tblDesc;
{noformat}
>From the exception:
{noformat}
Caused by: java.io.IOException:
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read
27264
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
... 13 more
{noformat}
I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also
couldnt figure out the reason why the reading went wrong.
Following is the code snippet from SequenceFile.java for the exception we are
hitting :
{noformat}
2417 public synchronized Object next(Object key) throws IOException {
2418 if (key != null && key.getClass() != getKeyClass()) {
2419 throw new IOException("wrong key class: "+key.getClass().getName()
2420 +" is not "+keyClass);
2421 }
2422
2423 if (!blockCompressed) {
2424 outBuf.reset();
2425
2426 keyLength = next(outBuf);
2427 if (keyLength < 0)
2428 return null;
2429
2430 valBuffer.reset(outBuf.getData(), outBuf.getLength());
2431
2432 key = deserializeKey(key);
2433 valBuffer.mark(0);
2434 if (valBuffer.getPosition() != keyLength)
2435 throw new IOException(key + " read " + valBuffer.getPosition()
2436 + " bytes, should read " + keyLength);
{noformat}
> Reduce side joins failing with IOException from RowContainer.nextBlock
> ----------------------------------------------------------------------
>
> Key: HIVE-9324
> URL: https://issues.apache.org/jira/browse/HIVE-9324
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.13.1
> Reporter: Amareshwari Sriramadasu
>
> We are seeing some reduce side join mapreduce jobs failing with following
> exception :
> {noformat}
> 2014-12-14 16:58:51,296 ERROR
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should
> read 27264
> java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8
> read 1 bytes, should read 27264
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> at
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 2014-12-14 16:58:51,334 FATAL ExecReducer:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should
> read 27264
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> at
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8
> read 1 bytes, should read 27264
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
> ... 12 more
> Caused by: java.io.IOException:
> org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should
> read 27264
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
> at
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> ... 13 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)