[ https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991122#comment-14991122 ]
Josh Mahonin commented on PHOENIX-2367: --------------------------------------- If there's no overhead at all I think nixing the config option and changing the default is fine as well. > Change PhoenixRecordWriter to use execute instead of executeBatch > ----------------------------------------------------------------- > > Key: PHOENIX-2367 > URL: https://issues.apache.org/jira/browse/PHOENIX-2367 > Project: Phoenix > Issue Type: Improvement > Reporter: Siddhi Mehta > Assignee: Siddhi Mehta > > Hey All, > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I > did some work on the apache pig code that allows Storers to have a notion of > Customizable/Configurable Errors PIG-4704. > I wanted to plug this behavior for PhoenixHbaseStorage and propose certain > changes for the same. > Current Behavior/Problem: > PhoenixRecordWriter makes use of executeBatch() to process rows once batch > size is reached. If there are any client side validation/syntactical errors > like data not fitting the column size, executeBatch() throws an exception and > there is no-way to retrieve the valid rows from the batch and retry them. We > discard the whole batch or fail the job without errorhandling. > With auto commit set to false execute() also servers the purpose of not > making any rpc calls but does a bunch of validation client side and adds it > to the client cache of mutation. > On conn.commit() we make a rpc call. > Proposed Change > To be able to use Configurable ErrorHandling and ignore only the failed > records instead of discarding the whole batch I want to propose changing the > behavior in PhoenixRecordWriter from execute to executeBatch() or having a > configuration to toggle between the 2 behaviors -- This message was sent by Atlassian JIRA (v6.3.4#6332)