Hi team, apologies for the last email, believe I sent too early. I'm interested in better understanding the ORC reference guide in the docs and wanted to clarify some things to see if I'm understanding correctly.
I realize for the *VectorizedRowBatch* approach, we write in chunks of 1024 rows and the *ColumnVectors* encapsulate this data for each respective column. I have a couple questions on this: *1)* When I'm looking at the file composition of an ORC file, I see the stripes are roughly 250mb. Are there *N* number of *VectorizedRowBatch(es) *per stripe in the output ORC file? *2)* With respect to adding row batches to the writer (i.e *orcWriter.addRowBatch(batch)*), do I have multiple batches in a single file? I assume because 1024 rows is still a small file size, I would write N number of row batches (ie, N calls of addRowBatch on the OrcWriter) until some parent criteria is satisfied. Thanks! Ryan
