[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records

stack (Updated) (JIRA) Tue, 27 Mar 2012 09:54:50 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-5564:
-------------------------

    Affects Version/s:     (was: 0.92.2)
                           (was: 0.90.7)
                           (was: 0.94.0)
         Hadoop Flags: Reviewed

Patch looks good.  Is this right:

{code}
+        return Long.parseLong(Base64.encodeBytes(lineBytes,
+            getColumnOffset(timestampKeyColumnIndex), 
getColumnLength(timestampKeyColumnIndex)));
{code}

As I read it, encode some passed bytes into a base64 String and then try to 
parse it as a long (it doesn't look like parseLong can interpret base64'd 
longs)?  Am I reading it wrong?

I was going to mark this an incompatible change but thinking on it, setting 
timestamp for the MR job once rather than per mapper seems like a bug fix.

Please write a bit of a release note at least explaining the changed behavior.

If the above is right and I'm just reading it wrong, will commit.  Let me know. 
 Thanks Laxman.
                
> Bulkload is discarding duplicate records
> ----------------------------------------
>
>                 Key: HBASE-5564
>                 URL: https://issues.apache.org/jira/browse/HBASE-5564
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.96.0
>         Environment: HBase 0.92
>            Reporter: Laxman
>            Assignee: Laxman
>              Labels: bulkloader
>             Fix For: 0.96.0
>
>         Attachments: 5564.lint, HBASE-5564_trunk.1.patch, 
> HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, 
> HBASE-5564_trunk.patch
>
>
> Duplicate records are getting discarded when duplicate records exists in same 
> input file and more specifically if they exists in same split.
> Duplicate records are considered if the records are from diffrent different 
> splits.
> Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records

Reply via email to