[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569237#comment-13569237 ]
Kevin Wilfong commented on HIVE-3874: ------------------------------------- @Owen: Regarding some of the issues I've seen: 1) In the add method in DynamicByteArray, the line which updates remaining seems a little off, and it causes a NPE if newLength > chunkSize / 2 I think it should be remaining -= size 2) I had trouble reading a column of only null values, I saw division by zero exceptions in a couple methods of DynamicByteArray. I wrote up a possible fix here https://reviews.facebook.net/D8361 but I'm not sure if it's the right fix. If you want to wait, I can file formal JIRAs for these later instead. > Create a new Optimized Row Columnar file format for Hive > -------------------------------------------------------- > > Key: HIVE-3874 > URL: https://issues.apache.org/jira/browse/HIVE-3874 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz > > > There are several limitations of the current RC File format that I'd like to > address by creating a new format: > * each column value is stored as a binary blob, which means: > ** the entire column value must be read, decompressed, and deserialized > ** the file format can't use smarter type-specific compression > ** push down filters can't be evaluated > * the start of each row group needs to be found by scanning > * user metadata can only be added to the file when the file is created > * the file doesn't store the number of rows per a file or row group > * there is no mechanism for seeking to a particular row number, which is > required for external indexes. > * there is no mechanism for storing light weight indexes within the file to > enable push-down filters to skip entire row groups. > * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira