[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017229#comment-13017229 ]
Krishna Kumar commented on HIVE-2065: ------------------------------------- The minor version is needed so that we can still read 6.0 files correctly. To recap, 6.0 files have incorrect record length and while reading, we make the necessary recalculations to fix it up, while 6.1 onwards have the correct record length stored on disk. [PS. I had suggested bumping up the sequence file version to 7 in a comment above, but I think a minor version is a better idea. The layout itself is still 'kinda sorta' version-6-compatible. For all we know, there may be a sequence file version 7, and then sequence file version 7 and rc file version 7 would be divergent.] [PPS. For the sake of completeness of documentation, here are the reason why the layout, even after the current patch, is still short of complete version-6 compatibility : [a] The KeyBuffer, denoted as the key class, is unable to read or write itself from/to the disk stream as the reading/writing the 4-byte key contents length field and the compression/decompression are being done by the reader/writer and not the KeyBuffer class and [b] The ValueBuffer, the value class, must be compressed as a unit to be compatible to sequence file reader/writer, but it is actually compressed as several units.] > RCFile issues > ------------- > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar > Assignee: Krishna Kumar > Priority: Minor > Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, > Slide1.png, proposal.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira