Then splitting according to user id's is out of the question :-) On Tue, Apr 7, 2015 at 8:12 AM, Юра <rvaniy....@gmail.com> wrote:
> There are 500 millions distinct users... > > 2015-04-07 17:45 GMT+03:00 Ted Yu <yuzhih...@gmail.com>: > >> How many distinct users are stored in HBase ? >> >> TableInputFormat produces splits where number of splits matches the >> number of regions in a table. You can write your own InputFormat which >> splits according to user id. >> >> FYI >> >> On Tue, Apr 7, 2015 at 7:36 AM, Юра <rvaniy....@gmail.com> wrote: >> >>> Hello, guys! >>> >>> I am a newbie to Spark and would appreciate any advice or help. >>> Here is the detailed question: >>> >>> >>> http://stackoverflow.com/questions/29493472/does-spark-utilize-the-sorted-order-of-hbase-keys-when-using-hbase-as-data-sour >>> >>> Regards, >>> Yury >>> >> >> >