aishikbh commented on code in PR #12220:
URL: https://github.com/apache/pinot/pull/12220#discussion_r1452475077
##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -107,62 +106,62 @@ public SegmentMapper(List<RecordReaderFileConfig>
recordReaderFileConfigs,
LOGGER.info("Initialized mapper with {} record readers, output dir: {},
timeHandler: {}, partitioners: {}",
_recordReaderFileConfigs.size(), _mapperOutputDir,
_timeHandler.getClass(),
Arrays.stream(_partitioners).map(p ->
p.getClass().toString()).collect(Collectors.joining(",")));
+
+ // initialize adaptive writer.
+ _adaptiveSizeBasedWriter =
+ new
AdaptiveSizeBasedWriter(processorConfig.getSegmentConfig().getIntermediateFileSizeThreshold());
}
/**
* Reads the input records and generates partitioned generic row files into
the mapper output directory.
* Records for each partition are put into a directory of the partition name
within the mapper output directory.
*/
- public Map<String, GenericRowFileManager> map()
+ public Map<String, GenericRowFileManager> map(int totalRecordReaderSize)
Review Comment:
for optimisation purposes we are working only on the sublist of the original
list of RecordReaderFileConfigs in `SegmentMapper`. So we infer the overall
index of the current recordrecorder being processed using the global total
count for log purposes.
Saw comments about logs so consolidating the response here :D
We need to pass the global count somehow or else we will lose the
granularity of logging data. The other option is to do it in
`SegmentprocessorFramework`, but in that case we will have sparse logging i.e.
we will only have logs when we terminate the mapper phase. Should we use a
different way to pass the total count? or should the logging in
`SegmentProcessorFramework` be enough? What do you suggest?
Just as a reference, here is how the logging looks currently on the UI (will
be similar in debug logs as well.)
<img width="706" alt="Screenshot 2024-01-11 at 11 40 32 PM"
src="https://github.com/apache/pinot/assets/15700987/5c2109d8-8d84-4195-aa7c-13cdb04520a9">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]