wgtmac commented on a change in pull request #952:
URL: https://github.com/apache/orc/pull/952#discussion_r779370588
##########
File path: java/core/src/java/org/apache/orc/impl/WriterImpl.java
##########
@@ -683,20 +683,22 @@ public void addUserMetadata(String name, ByteBuffer
value) {
@Override
public void addRowBatch(VectorizedRowBatch batch) throws IOException {
+ InternalVectorizedRowBatch internalBatch =
InternalVectorizedRowBatch.encapsulation(batch);
+ int batchSize = internalBatch.size();
try {
// If this is the first set of rows in this stripe, tell the tree writers
// to prepare the stripe.
- if (batch.size != 0 && rowsInStripe == 0) {
+ if (batchSize != 0 && rowsInStripe == 0) {
treeWriter.prepareStripe(stripes.size() + 1);
}
if (buildIndex) {
// Batch the writes up to the rowIndexStride so that we can get the
// right size indexes.
int posn = 0;
- while (posn < batch.size) {
- int chunkSize = Math.min(batch.size - posn,
+ while (posn < batchSize) {
+ int chunkSize = Math.min(batchSize - posn,
rowIndexStride - rowsInIndex);
- treeWriter.writeRootBatch(batch, posn, chunkSize);
+ treeWriter.writeRootBatch(internalBatch, posn, chunkSize);
Review comment:
I think the root treeWriter can deal with selected nested columns very
well. Let me demonstrate my idea in this patch:
https://github.com/wgtmac/orc/commit/42ee957be4113af62ccb59319d861fb9462a7c1e
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]