[
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Laxman updated HBASE-5564:
--------------------------
Release Note:
1) Provision for using the existing timestamp (HBASE_TS_KEY)
2) Bug fix to use same timestamp across mappers.
Status: Patch Available (was: Open)
Attached the final patch for review and commit.
Changes from previous patch
1) Encoding issue
2) Proper handling for bad records (with invalid timestamp)
3) New unit tests to test the parser (with valid & invalid timestamp)
Note: QA may report 2 new findbugs. As explained earlier, these findings are
due to usage of default encoding (String.getBytes, new String) which is inline
with the existing behavior.
> Bulkload is discarding duplicate records
> ----------------------------------------
>
> Key: HBASE-5564
> URL: https://issues.apache.org/jira/browse/HBASE-5564
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.96.0
> Environment: HBase 0.92
> Reporter: Laxman
> Assignee: Laxman
> Labels: bulkloader
> Fix For: 0.96.0
>
> Attachments: 5564.lint, HBASE-5564_trunk.1.patch,
> HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch,
> HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch
>
>
> Duplicate records are getting discarded when duplicate records exists in same
> input file and more specifically if they exists in same split.
> Duplicate records are considered if the records are from diffrent different
> splits.
> Version under test: HBase 0.92
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira