Hi All I have an JavaPairRDD<Long,String> where each long key have 4
 string values associated with it. I want to fire the Hbase query for look
up of the  each String part of RDD.
This look-up will give result of around 7K integers.so for each key I will
have 7k values. now my  input RDD always already more than GB and after
getting these result it will become around 50 GB which  I want avoid .

My problem. <1, Test1>
                    <1,test2>
                     <1.test3>
                     <1, test4>
                     .......
                     .............
Now I will query Hbase for Test1, test2 test3 ,test4 in parallel ech query
will give result around 2K so total 8k of integers.

Now for each record I will have 1*8000 entries in my RDD and suppose I have
1 million record it will become 1 million*8000 will is huge to process even
using GroupBy.

Reply via email to