Where is the failure happening? If it is happening in the ORC writer code, there isn't a way to do that. Can I ask what kind of exception you are hitting? In the column (aka tree) writers, there shouldn't be much that can go wrong. It doesn't even write to the file handle, just buffering in memory.
If the problem is in your code, you should be able to use the selected vector in the VectorizedRowBatch to just select the other rows. .. Owen On Fri, Sep 11, 2020 at 7:12 AM Ryan Schachte <[email protected]> wrote: > I'm writing a streaming application that converts incoming data into ORC in > real-time. One thing I'm implementing is a dead-letter queue that still > allows me to continue the batch processing even if a single record fails. > > The caveat to this, is I want to remove the data that has been written thus > far if a failure occurs on say the 6th column out of 10 columns. For > example: > > I write the following data: > > { > firstName: blah1, > lastName: blah2, > otherData: blah3 > } > > My question is, if I fail on otherData, I want to "rollback" the data from > the column vectors at the current vectorPosition I'm iterating on. Is it as > simple as setting colVector.isNull[vectorPosition] to true and setting > colVector.noNulls to false? I wanted to originally go into the index for > each column vector and override, but I don't see an easy way to do that. > > Cheers!! > Ryan Schachte >
