Re: Review Request 45238: HIVE-9660 store end offset of compressed data for RG in RowIndex in ORC

j . prasanth . j Wed, 06 Apr 2016 16:28:43 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45238/#review126974
-----------------------------------------------------------





common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 1205 - 1206)
<https://reviews.apache.org/r/45238/#comment190098>

    IMO, this should not be made an option for the user. Should be enabled as 
long as index is enabled. If for some reason, this is wrong, then disk range 
computation should fallback to old code path using WORST_CASE_SLOP (2 x buffer 
size). Will be better to have config for fallback path as opposed to tracking 
lengths.



orc/src/java/org/apache/orc/OrcConf.java (lines 100 - 102)
<https://reviews.apache.org/r/45238/#comment190102>

    same as my above comment.



orc/src/java/org/apache/orc/TypeDescription.java (line 384)
<https://reviews.apache.org/r/45238/#comment190103>

    Is this related?



orc/src/java/org/apache/orc/impl/OutStream.java (line 323)
<https://reviews.apache.org/r/45238/#comment190393>

    Is this related? doesn't seem to be used anywhere



orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java (line 191)
<https://reviews.apache.org/r/45238/#comment190830>

    By flush does this mean writing to OutStream? What if numLiterals straddles 
compression buffers (first CB is flushed and closed and second CB contains some 
literals)? Is that still considered to be flushed?



orc/src/java/org/apache/orc/impl/SerializationUtils.java (line 1138)
<https://reviews.apache.org/r/45238/#comment190831>

    Can you reuse SerializationUtils.readFully() method here? You have to make 
that method public.



orc/src/java/org/apache/orc/impl/WriterImpl.java (line 592)
<https://reviews.apache.org/r/45238/#comment190835>

    I think it will be easier if we maintain a Map<String, LengthsBuilder> that 
maps stream name and proto builder of lengths. The callback interface can be 
reportCompressionBuffer(String name, int cbSize). When this callback is called, 
length information can be appended to the lengths builder for the corresponding 
stream. When flushing the stripe, using the same stream name we can check if 
the stream is suppressed or not. With that we can also avoid the index 
adjustment. 
    
    I am just seeing if there are ways to simplify the state maintenance here. 
Thoughts?



orc/src/protobuf/orc_proto.proto (line 85)
<https://reviews.apache.org/r/45238/#comment190096>

    If this is a required field, it will break backward compatibility. Isn't it?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java (line 279)
<https://reviews.apache.org/r/45238/#comment190840>

    I don't see any changes that bypasses this when lengths are available. Is 
that going into separate jira?


- Prasanth_J


On March 23, 2016, 7:08 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45238/
> -----------------------------------------------------------
> 
> (Updated March 23, 2016, 7:08 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c14df20 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapOptionsProcessor.java
>  c292b37 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  eb251a8 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
>  82187bd 
>   orc/src/java/org/apache/orc/OrcConf.java 6fcbb72 
>   orc/src/java/org/apache/orc/OrcFile.java 3945a5d 
>   orc/src/java/org/apache/orc/TypeDescription.java bd900ac 
>   orc/src/java/org/apache/orc/impl/BitFieldWriter.java aa5f886 
>   orc/src/java/org/apache/orc/impl/IntegerWriter.java 419054f 
>   orc/src/java/org/apache/orc/impl/OutStream.java 81662cc 
>   orc/src/java/org/apache/orc/impl/RunLengthByteWriter.java 09108b2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriter.java 3e5f2e2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java fab2801 
>   orc/src/java/org/apache/orc/impl/SerializationUtils.java c1162e4 
>   orc/src/java/org/apache/orc/impl/WriterImpl.java 6497ecf 
>   orc/src/protobuf/orc_proto.proto f4935b4 
>   orc/src/test/org/apache/orc/impl/TestBitFieldReader.java e4c6f6b 
>   orc/src/test/org/apache/orc/impl/TestBitPack.java f2d3d64 
>   orc/src/test/org/apache/orc/impl/TestInStream.java 9e65345 
>   orc/src/test/org/apache/orc/impl/TestIntegerCompressionReader.java 399f35e 
>   orc/src/test/org/apache/orc/impl/TestOutStream.java e9614d5 
>   orc/src/test/org/apache/orc/impl/TestRunLengthByteReader.java a14bef1 
>   orc/src/test/org/apache/orc/impl/TestRunLengthIntegerReader.java 28239ba 
>   ql/src/java/org/apache/hadoop/hive/llap/DebugUtils.java ea626d7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezJobMonitor.java 67f9da8 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecOrcFileDump.java 
> d5d1370 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9c2f88f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/JsonFileDump.java 00de545 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java 8a73948 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
> 96af96a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 29b51ec 
>   ql/src/test/queries/clientpositive/orc_lengths.q PRE-CREATION 
>   ql/src/test/results/clientpositive/orc_lengths.q.out PRE-CREATION 
>   
> storage-api/src/java/org/apache/hadoop/hive/common/io/encoded/EncodedColumnBatch.java
>  ddba889 
> 
> Diff: https://reviews.apache.org/r/45238/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>

Re: Review Request 45238: HIVE-9660 store end offset of compressed data for RG in RowIndex in ORC

Reply via email to