I'm writing a streaming application that converts incoming data into ORC in real-time. One thing I'm implementing is a dead-letter queue that still allows me to continue the batch processing even if a single record fails.
The caveat to this, is I want to remove the data that has been written thus far if a failure occurs on say the 6th column out of 10 columns. For example: I write the following data: { firstName: blah1, lastName: blah2, otherData: blah3 } My question is, if I fail on otherData, I want to "rollback" the data from the column vectors at the current vectorPosition I'm iterating on. Is it as simple as setting colVector.isNull[vectorPosition] to true and setting colVector.noNulls to false? I wanted to originally go into the index for each column vector and override, but I don't see an easy way to do that. Cheers!! Ryan Schachte