I'm writing a streaming application that converts incoming data into ORC in
real-time. One thing I'm implementing is a dead-letter queue that still
allows me to continue the batch processing even if a single record fails.

The caveat to this, is I want to remove the data that has been written thus
far if a failure occurs on say the 6th column out of 10 columns. For
example:

I write the following data:

{
 firstName: blah1,
 lastName: blah2,
 otherData: blah3
}

My question is, if I fail on otherData, I want to "rollback" the data from
the column vectors at the current vectorPosition I'm iterating on. Is it as
simple as setting colVector.isNull[vectorPosition] to true and setting
colVector.noNulls to false? I wanted to originally go into the index for
each column vector and override, but I don't see an easy way to do that.

Cheers!!
Ryan Schachte

Reply via email to