ege-st commented on code in PR #11776:
URL: https://github.com/apache/pinot/pull/11776#discussion_r1366019084
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java:
##########
@@ -273,6 +280,45 @@ public void build()
handlePostCreation();
}
+ public void buildByColumn(IndexSegment indexSegment)
+ throws Exception {
+ // Count the number of documents and gather per-column statistics
+ LOGGER.debug("Start building StatsCollector!");
+ buildIndexCreationInfo();
+ LOGGER.info("Finished building StatsCollector!");
+ LOGGER.info("Collected stats for {} documents", _totalDocs);
+
+ try {
+ // Initialize the index creation using the per-column statistics
information
+ // TODO: _indexCreationInfoMap holds the reference to all unique values
on heap (ColumnIndexCreationInfo ->
+ // ColumnStatistics) throughout the segment creation. Find a way
to release the memory early.
+ _indexCreator.init(_config, _segmentIndexCreationInfo,
_indexCreationInfoMap, _dataSchema, _tempIndexDir);
+
+ // Build the indexes
+ LOGGER.info("Start building Index by column");
+
+ TreeSet<String> columns = _dataSchema.getPhysicalColumnNames();
+
+ // TODO: Eventually pull the doc Id sorting logic out of Record Reader
so that all row oriented logic can be
+ // removed from this code.
+ int[] sortedDocIds = ((PinotSegmentRecordReader)
_recordReader).getSortedDocIds();
+ for (String col : columns) {
+ _indexCreator.indexColumn(col, sortedDocIds, indexSegment);
+ }
+ } catch (Exception e) {
+ _indexCreator.close(); // TODO: Why is this only closed on an exception?
+ throw e;
+ } finally {
+ _recordReader.close();
Review Comment:
The `_recordReader` is created in the `init` method so I wanted to make sure
it gets closed in this path as well. I didn't want to completely refactor the
`init` method to avoid creating too much change here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]