I'm writing a streaming application that converts incoming data into ORC in
real-time. One thing I'm implementing is a dead-letter queue that still
allows me to continue the batch processing even if a single record fails.
The caveat to this, is I want to remove the data that has been written thus
far if a failure occurs on say the 6th column out of 10 columns. For
example:
I write the following data:
{
firstName: blah1,
lastName: blah2,
otherData: blah3
}
My question is, if I fail on otherData, I want to "rollback" the data from
the column vectors at the current vectorPosition I'm iterating on. Is it as
simple as setting colVector.isNull[vectorPosition] to true and setting
colVector.noNulls to false? I wanted to originally go into the index for
each column vector and override, but I don't see an easy way to do that.
Cheers!!
Ryan Schachte