[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233417#comment-13233417 ]
Anoop Sam John commented on HBASE-5564: --------------------------------------- Comment from Jesse Yates {quote} The question is, if you have a TSV file with the same row key, which value should be considered the most recent version? Should any of them - maybe that is actually a problem and we want to have a warning/error when that occurs? {quote} Do we need to handle this? The issue is TreeSet used by PutSortReducer and KeyValueSortReducer as mentioned by Laxman. In normal data insertion using Puts, all the duplicate values will go into the memstore (and finally to HFiles) and while scan the last entered one will get retrieved. In this bulk load case the 1st data only will get inserted as DS avoid the duplicates. Is this a behaviour mismatch? But this depends on which entry in the TSV file needs to be considered as the recent version.If we say that last entry coming in the file is the recent version..... > Bulkload is discarding duplicate records > ---------------------------------------- > > Key: HBASE-5564 > URL: https://issues.apache.org/jira/browse/HBASE-5564 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > Environment: HBase 0.92 > Reporter: Laxman > Assignee: Laxman > Labels: bulkloader > Attachments: HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, > HBASE-5564_trunk.patch > > > Duplicate records are getting discarded when duplicate records exists in same > input file and more specifically if they exists in same split. > Duplicate records are considered if the records are from diffrent different > splits. > Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira