rajagopr commented on code in PR #10874:
URL: https://github.com/apache/pinot/pull/10874#discussion_r1226682325
##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -122,32 +144,23 @@ public Map<String, GenericRowFileManager> map()
private Map<String, GenericRowFileManager> doMap()
throws Exception {
Consumer<Object> observer = _processorConfig.getProgressObserver();
- int totalCount = _recordReaders.size();
+ int totalCount = _recordReaderFileConfigs.size();
int count = 1;
GenericRow reuse = new GenericRow();
- for (RecordReader recordReader : _recordReaders) {
- observer.accept(String.format("Doing map phase on data from RecordReader
(%d out of %d)", count++, totalCount));
- while (recordReader.hasNext()) {
- reuse = recordReader.next(reuse);
-
- // TODO: Add ComplexTypeTransformer here. Currently it is not
idempotent so cannot add it
-
- if (reuse.getValue(GenericRow.MULTIPLE_RECORDS_KEY) != null) {
- //noinspection unchecked
- for (GenericRow row : (Collection<GenericRow>)
reuse.getValue(GenericRow.MULTIPLE_RECORDS_KEY)) {
- GenericRow transformedRow = _recordTransformer.transform(row);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(transformedRow)) {
- writeRecord(transformedRow);
- }
- }
- } else {
- GenericRow transformedRow = _recordTransformer.transform(reuse);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(transformedRow)) {
- writeRecord(transformedRow);
- }
- }
-
- reuse.clear();
+ boolean inited = false;
+ for (RecordReaderFileConfig recordReaderFileConfig :
_recordReaderFileConfigs) {
+ RecordReader recordReader = recordReaderFileConfig._recordReader;
+ if (recordReader == null) {
+ recordReader =
+
RecordReaderFactory.getRecordReader(recordReaderFileConfig._fileFormat,
recordReaderFileConfig._dataFile,
+ recordReaderFileConfig._fieldsToRead,
recordReaderFileConfig._recordReaderConfig);
+ inited = true;
+ }
+ mapAndTransformRow(recordReader, reuse, observer, count, totalCount);
Review Comment:
Should this not be inside `try {...} finally{...}` to ensure close of record
readers?
##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -97,10 +104,25 @@ public SegmentMapper(List<RecordReader> recordReaders,
SegmentProcessorConfig pr
}
// Time partition + partition from partitioners
_partitionsBuffer = new String[numPartitioners + 1];
+
+ if (_recordReaders != null) {
+ Preconditions.checkState(_recordReaderFileConfigs == null, "Cannot
populate both "
+ + "ReaderReaders and RecordReaderFileConfigs");
Review Comment:
nit: `RecordReaders`
##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -122,32 +144,23 @@ public Map<String, GenericRowFileManager> map()
private Map<String, GenericRowFileManager> doMap()
throws Exception {
Consumer<Object> observer = _processorConfig.getProgressObserver();
- int totalCount = _recordReaders.size();
+ int totalCount = _recordReaderFileConfigs.size();
int count = 1;
GenericRow reuse = new GenericRow();
- for (RecordReader recordReader : _recordReaders) {
- observer.accept(String.format("Doing map phase on data from RecordReader
(%d out of %d)", count++, totalCount));
- while (recordReader.hasNext()) {
- reuse = recordReader.next(reuse);
-
- // TODO: Add ComplexTypeTransformer here. Currently it is not
idempotent so cannot add it
-
- if (reuse.getValue(GenericRow.MULTIPLE_RECORDS_KEY) != null) {
- //noinspection unchecked
- for (GenericRow row : (Collection<GenericRow>)
reuse.getValue(GenericRow.MULTIPLE_RECORDS_KEY)) {
- GenericRow transformedRow = _recordTransformer.transform(row);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(transformedRow)) {
- writeRecord(transformedRow);
- }
- }
- } else {
- GenericRow transformedRow = _recordTransformer.transform(reuse);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(transformedRow)) {
- writeRecord(transformedRow);
- }
- }
-
- reuse.clear();
+ boolean inited = false;
+ for (RecordReaderFileConfig recordReaderFileConfig :
_recordReaderFileConfigs) {
+ RecordReader recordReader = recordReaderFileConfig._recordReader;
+ if (recordReader == null) {
+ recordReader =
+
RecordReaderFactory.getRecordReader(recordReaderFileConfig._fileFormat,
recordReaderFileConfig._dataFile,
+ recordReaderFileConfig._fieldsToRead,
recordReaderFileConfig._recordReaderConfig);
+ inited = true;
Review Comment:
we can get rid of this variable.
```
try {
mapAndTransformRow(recordReader, reuse, observer, count, totalCount);
} finally {
recordReader.close();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]