Hi all!

I have a MR jobs to import contents to HBase. Before importing, I have
to determine the new contents to import (The row key in Hbase is URI).
After import this new contents to HBase.


Assume, I have large content in HBabse (> 1,000,000,000 URIs) and I have
1,000,000 URIs need to import (new + existed in Hbase). How to get new
contents (URIs) to import?


The current solution: I check the existed of the URI in Hbase to get the
new URIs. Some things like:

           

            RowResult row = hTable.getRow(uri);
            if (row.isEmpty()) {

                // collect the new content (URI)

            }


With this solution, if URIs is large then the time connection to HBase
is large :(


Please suggest for me the good solution. :)


Thanks!


Best regards,

Nguyen.



Reply via email to