[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apekshit Sharma updated HBASE-13702: ------------------------------------ Attachment: HBASE-13702-v3.patch > ImportTsv: Add dry-run functionality and log bad rows > ----------------------------------------------------- > > Key: HBASE-13702 > URL: https://issues.apache.org/jira/browse/HBASE-13702 > Project: HBase > Issue Type: New Feature > Reporter: Apekshit Sharma > Assignee: Apekshit Sharma > Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, > HBASE-13702.patch > > > ImportTSV job skips bad records by default (keeps a count though). > -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is > encountered. > To be easily able to determine which rows are corrupted in an input, rather > than failing on one row at a time seems like a good feature to have. > Moreover, there should be 'dry-run' functionality in such kinds of tools, > which can essentially does a quick run of tool without making any changes but > reporting any errors/warnings and success/failure. > To identify corrupted rows, simply logging them should be enough. In worst > case, all rows will be logged and size of logs will be same as input size, > which seems fine. However, user might have to do some work figuring out where > the logs. Is there some link we can show to the user when the tool starts > which can help them with that? > For the dry run, we can simply use if-else to skip over writing out KVs, and > any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)