Hi team,
apologies for the last email, believe I sent too early. I'm interested in
better understanding the ORC reference guide in the docs and wanted to
clarify some things to see if I'm understanding correctly.

I realize for the *VectorizedRowBatch* approach, we write in chunks of 1024
rows and the *ColumnVectors* encapsulate this data for each respective
column.

I have a couple questions on this:

*1)* When I'm looking at the file composition of an ORC file, I see the
stripes are roughly 250mb. Are there *N* number of *VectorizedRowBatch(es) *per
stripe in the output ORC file?

*2)* With respect to adding row batches to the writer (i.e
*orcWriter.addRowBatch(batch)*), do I have multiple batches in a single
file? I assume because 1024 rows is still a small file size, I would write
N number of row batches (ie, N calls of addRowBatch on the OrcWriter) until
some parent criteria is satisfied.

Thanks!
Ryan

Reply via email to