[ https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddhi Mehta updated PHOENIX-2367: ---------------------------------- Attachment: PHOENIX-2367.patch [~giacomotaylor],[~jfernando_sfdc],[~prkommireddi],[~maghamraviki...@gmail.com] Can you guys review the change. I ran phoenix-pig tests. Any other tests I should be running? > Change PhoenixRecordWriter to use execute instead of executeBatch > ----------------------------------------------------------------- > > Key: PHOENIX-2367 > URL: https://issues.apache.org/jira/browse/PHOENIX-2367 > Project: Phoenix > Issue Type: Improvement > Reporter: Siddhi Mehta > Assignee: Siddhi Mehta > Attachments: PHOENIX-2367.patch > > > Hey All, > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I > did some work on the apache pig code that allows Storers to have a notion of > Customizable/Configurable Errors PIG-4704. > I wanted to plug this behavior for PhoenixHbaseStorage and propose certain > changes for the same. > Current Behavior/Problem: > PhoenixRecordWriter makes use of executeBatch() to process rows once batch > size is reached. If there are any client side validation/syntactical errors > like data not fitting the column size, executeBatch() throws an exception and > there is no-way to retrieve the valid rows from the batch and retry them. We > discard the whole batch or fail the job without errorhandling. > With auto commit set to false execute() also servers the purpose of not > making any rpc calls but does a bunch of validation client side and adds it > to the client cache of mutation. > On conn.commit() we make a rpc call. > Proposed Change > To be able to use Configurable ErrorHandling and ignore only the failed > records instead of discarding the whole batch I want to propose changing the > behavior in PhoenixRecordWriter from execute to executeBatch() or having a > configuration to toggle between the 2 behaviors -- This message was sent by Atlassian JIRA (v6.3.4#6332)