Re: Improving hbase read performance

shourabh rawat Wed, 18 Feb 2009 07:53:27 -0800

here's wat i m doin...

this is my get function
it should retrieve entities in parallel by creating parallel threads
for each get.


public String[] get(String tableName,String[] entityIDS){
            ExecutorService threadExecutor = Executors.newFixedThreadPool(50);
            String[] contents = new String[entityIDS.length];
            long initime=System.currentTimeMillis();
            int i = 0;
            while (i < entityIDS.length) {
                threadExecutor.execute(new ReadThread(conf,tableName,
contents, entityIDS[i], i));
                i++;
            }
            threadExecutor.shutdown();
            while(!threadExecutor.isTerminated());
            return contents;
    }


and here's the thread

    public void run() {
        long ab=System.currentTimeMillis();
        try {
             Cell c=table.get(entityID, "content:");
             String content=new String(c.getValue());
            if(content==null) j[index]="NULL";
            else {
                    j[index]=content;
            }
        } catch (IOException ex) {
            Logger.getLogger(ReadThread.class.getName()).log(Level.SEVERE,
null, ex);
        }
        System.out.println(System.currentTimeMillis()-ab + " " + "time
taken to complete for " + "process " + index);
    }

i m creating new htable instance for each such thread
Is this way correct.....would i get a better performance from this.
will my get queries be executed in parallel by the hbase



On Wed, Feb 18, 2009 at 11:27 AM, shourabh rawat <[email protected]> wrote:
> does the number of regionservers affect this performance??
>
> On Wed, Feb 18, 2009 at 11:23 AM, shourabh rawat <[email protected]> wrote:
>> hey,
>>
>> "> What do you mean by the above when you say read sequentially? Are you
>>> scanning? (Getting a scanner and then nexting through your hbase table?)."
>>
>> well lets say i have 10 keys that are stored in hbase
>> i want to retrive them
>>
>> If I do the reads one by one the time would be summation of  'get'
>> times of each key
>> Could i do the same thing in parallel. so that all the get's cld occur
>> concurrently so i would get total time as the max of the time taken by
>> any of these keys rather than the summ of individual times
>>
>>
>> "
>>> You will have to wait for hbase 0.20.0 or do as Erik suggests and put a
>>> cache in front of hbase.  What are you trying to do with hbase?  Serve a
>>> website? "
>>
>> ya sort of but i want to check performance withought the use of cache
>> (random reads) ....can i get such performance in the range of 10 ms
>> with hbase
>>
>>> Yeah, the RPC keeps a single connection per remote server but channel is
>>> shared by request and receive.  Testing in past, the more remote servers,
>>> the better, but even if a few only, concurrent HTables got better throughput
>>> than one running requests in series (the single connection is not fully
>>> occupied by requests and responses).
>>>
>>
>> so by a single connection u mean all the gets wld be treated
>> sequentially (one by one) by the hbase even wen the requests come in
>> parallel(even wen different htable instances for the same table are
>> employed)....is there any way i can make it parallel.....
>> The hbase master has one port that it specifies and other is the port
>> for the hdfs (hadoop)....what can be done to increase the number of
>> connection as u said.......
>>
>>
>> Thanx for yr help.
>>
>

Re: Improving hbase read performance

Reply via email to