Re: newAPIHadoopRDD Mutiple scan result return from Hbase

Ted Yu Sun, 05 Apr 2015 14:46:58 -0700

bq. HBase scan operation like scan StartROW and EndROW in RDD?

I don't think RDD supports concept of start row and end row.


In HBase, please take a look at the following methods of Scan:

  public Scan setStartRow(byte [] startRow) {

  public Scan setStopRow(byte [] stopRow) {

Cheers

On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> I have  2GB hbase table where this data is store in the form on key and
> value(only one column per key) and key also unique
>
> What I thinking to load the complete hbase table into RDD and then do the
> operation like scan and all in RDD rather than Hbase.
> Can I do  HBase scan operation like scan StartROW and EndROW in RDD?
>
> Firrst steps in my job will be to load the complete data into RDD.
>
>
>
> On 6 April 2015 at 02:45, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> You do need to apply the patch since 0.96 doesn't have this feature.
>>
>> For JavaSparkContext.newAPIHadoopRDD, can you check region server
>> metrics to see where the overhead might be (compared to creating scan
>> and firing query using native client) ?
>>
>> Thanks
>>
>> On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele <gangele...@gmail.com>
>> wrote:
>>
>>> Thats true I checked the MultiRowRangeFilter  and its serving my need.
>>> do I need to apply the patch? for this since I am using 0.96 hbase
>>> version.
>>>
>>> Also I have checked when I used JavaSparkContext.newAPIHadoopRDD its
>>> slow compare to creating scan and firing query, is there any reason?
>>>
>>>
>>>
>>>
>>> On 6 April 2015 at 01:57, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> Looks like MultiRowRangeFilter would serve your need.
>>>>
>>>> See HBASE-11144.
>>>>
>>>> HBase 1.1 would be released in May.
>>>>
>>>> You can also backport it to the HBase release you're using.
>>>>
>>>> On Sat, Apr 4, 2015 at 8:45 AM, Jeetendra Gangele <gangele...@gmail.com
>>>> > wrote:
>>>>
>>>>> Here is my conf object passing first parameter of API.
>>>>> but here I want to pass multiple scan means i have 4 criteria for
>>>>> STRAT ROW and STOROW in same table.
>>>>> by using below code i can get result for one STARTROW and ENDROW.
>>>>>
>>>>> Configuration conf = DBConfiguration.getConf();
>>>>>
>>>>> // int scannerTimeout = (int) conf.getLong(
>>>>> //      HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1);
>>>>> // System.out.println("lease timeout on server is"+scannerTimeout);
>>>>>
>>>>> int scannerTimeout = (int) conf.getLong(
>>>>>     "hbase.client.scanner.timeout.period", -1);
>>>>> // conf.setLong("hbase.client.scanner.timeout.period", 60000L);
>>>>> conf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME);
>>>>> Scan scan = new Scan();
>>>>> scan.addFamily(FAMILY);
>>>>> FilterList filterList = new FilterList(Operator.MUST_PASS_ALL);
>>>>> filterList.addFilter(new KeyOnlyFilter());
>>>>>  filterList.addFilter(new FirstKeyOnlyFilter());
>>>>> scan.setFilter(filterList);
>>>>>
>>>>> scan.setCacheBlocks(false);
>>>>> scan.setCaching(10);
>>>>>  scan.setBatch(1000);
>>>>> scan.setSmall(false);
>>>>>  conf.set(TableInputFormat.SCAN,
>>>>> DatabaseUtils.convertScanToString(scan));
>>>>> return conf;
>>>>>
>>>>> On 4 April 2015 at 20:54, Jeetendra Gangele <gangele...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Can we get the result of the multiple scan
>>>>>> from JavaSparkContext.newAPIHadoopRDD from Hbase.
>>>>>>
>>>>>> This method first parameter take configuration object where I have
>>>>>> added filter. but how Can I query multiple scan from same table calling
>>>>>> this API only once?
>>>>>>
>>>>>> regards
>>>>>> jeetendra
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
>

Re: newAPIHadoopRDD Mutiple scan result return from Hbase

Reply via email to