[ https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990554#comment-14990554 ]
Siddhi Mehta commented on PHOENIX-2367: --------------------------------------- [~jmahonin] I believe there is no performance overhead for the same. [~giacomotaylor],[~maghamraviki...@gmail.com],[~jfernando_sfdc] Thoughts on the same? We can make this change configurable based on a property value. How about phoenix.record.writer.batch.execute with it being set to default true for existing behaviour > Change PhoenixRecordWriter to use execute instead of executeBatch > ----------------------------------------------------------------- > > Key: PHOENIX-2367 > URL: https://issues.apache.org/jira/browse/PHOENIX-2367 > Project: Phoenix > Issue Type: Improvement > Reporter: Siddhi Mehta > Assignee: Siddhi Mehta > > Hey All, > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I > did some work on the apache pig code that allows Storers to have a notion of > Customizable/Configurable Errors PIG-4704. > I wanted to plug this behavior for PhoenixHbaseStorage and propose certain > changes for the same. > Current Behavior/Problem: > PhoenixRecordWriter makes use of executeBatch() to process rows once batch > size is reached. If there are any client side validation/syntactical errors > like data not fitting the column size, executeBatch() throws an exception and > there is no-way to retrieve the valid rows from the batch and retry them. We > discard the whole batch or fail the job without errorhandling. > With auto commit set to false execute() also servers the purpose of not > making any rpc calls but does a bunch of validation client side and adds it > to the client cache of mutation. > On conn.commit() we make a rpc call. > Proposed Change > To be able to use Configurable ErrorHandling and ignore only the failed > records instead of discarding the whole batch I want to propose changing the > behavior in PhoenixRecordWriter from execute to executeBatch() or having a > configuration to toggle between the 2 behaviors -- This message was sent by Atlassian JIRA (v6.3.4#6332)