siddharthteotia commented on a change in pull request #5256: Derive num docs 
per chunk from max column value length for varbyte raw index creator
URL: https://github.com/apache/incubator-pinot/pull/5256#discussion_r409999951
 
 

 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java
 ##########
 @@ -27,15 +28,21 @@
 
 
 public class SingleValueVarByteRawIndexCreator extends 
BaseSingleValueRawIndexCreator {
-  private static final int NUM_DOCS_PER_CHUNK = 1000; // TODO: Auto-derive 
this based on metadata.
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024*1024;
 
   private final VarByteChunkSingleValueWriter _indexWriter;
 
   public SingleValueVarByteRawIndexCreator(File baseIndexDir, 
ChunkCompressorFactory.CompressionType compressionType,
       String column, int totalDocs, int maxLength)
       throws IOException {
     File file = new File(baseIndexDir, column + 
V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION);
-    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, 
totalDocs, NUM_DOCS_PER_CHUNK, maxLength);
+    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, 
totalDocs, getNumDocsPerChunk(maxLength), maxLength);
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
 
 Review comment:
   It can be. The call to super() in the constructor of 
VarByteChunkSingleValueWriter makes things slightly since you have to call this 
function two times (as part of the call to super). I think the constructor of 
VarByteChunkSingleValueWriter and its base class can be refactored a little bit 
to make this logic private to the writer.
   
   I have a follow-up coming up for the TODO mentioned in the PR description. 
Will do as part of that

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to