bq. HBase scan operation like scan StartROW and EndROW in RDD? I don't think RDD supports concept of start row and end row.
In HBase, please take a look at the following methods of Scan: public Scan setStartRow(byte [] startRow) { public Scan setStopRow(byte [] stopRow) { Cheers On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele <gangele...@gmail.com> wrote: > I have 2GB hbase table where this data is store in the form on key and > value(only one column per key) and key also unique > > What I thinking to load the complete hbase table into RDD and then do the > operation like scan and all in RDD rather than Hbase. > Can I do HBase scan operation like scan StartROW and EndROW in RDD? > > Firrst steps in my job will be to load the complete data into RDD. > > > > On 6 April 2015 at 02:45, Ted Yu <yuzhih...@gmail.com> wrote: > >> You do need to apply the patch since 0.96 doesn't have this feature. >> >> For JavaSparkContext.newAPIHadoopRDD, can you check region server >> metrics to see where the overhead might be (compared to creating scan >> and firing query using native client) ? >> >> Thanks >> >> On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele <gangele...@gmail.com> >> wrote: >> >>> Thats true I checked the MultiRowRangeFilter and its serving my need. >>> do I need to apply the patch? for this since I am using 0.96 hbase >>> version. >>> >>> Also I have checked when I used JavaSparkContext.newAPIHadoopRDD its >>> slow compare to creating scan and firing query, is there any reason? >>> >>> >>> >>> >>> On 6 April 2015 at 01:57, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> Looks like MultiRowRangeFilter would serve your need. >>>> >>>> See HBASE-11144. >>>> >>>> HBase 1.1 would be released in May. >>>> >>>> You can also backport it to the HBase release you're using. >>>> >>>> On Sat, Apr 4, 2015 at 8:45 AM, Jeetendra Gangele <gangele...@gmail.com >>>> > wrote: >>>> >>>>> Here is my conf object passing first parameter of API. >>>>> but here I want to pass multiple scan means i have 4 criteria for >>>>> STRAT ROW and STOROW in same table. >>>>> by using below code i can get result for one STARTROW and ENDROW. >>>>> >>>>> Configuration conf = DBConfiguration.getConf(); >>>>> >>>>> // int scannerTimeout = (int) conf.getLong( >>>>> // HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1); >>>>> // System.out.println("lease timeout on server is"+scannerTimeout); >>>>> >>>>> int scannerTimeout = (int) conf.getLong( >>>>> "hbase.client.scanner.timeout.period", -1); >>>>> // conf.setLong("hbase.client.scanner.timeout.period", 60000L); >>>>> conf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME); >>>>> Scan scan = new Scan(); >>>>> scan.addFamily(FAMILY); >>>>> FilterList filterList = new FilterList(Operator.MUST_PASS_ALL); >>>>> filterList.addFilter(new KeyOnlyFilter()); >>>>> filterList.addFilter(new FirstKeyOnlyFilter()); >>>>> scan.setFilter(filterList); >>>>> >>>>> scan.setCacheBlocks(false); >>>>> scan.setCaching(10); >>>>> scan.setBatch(1000); >>>>> scan.setSmall(false); >>>>> conf.set(TableInputFormat.SCAN, >>>>> DatabaseUtils.convertScanToString(scan)); >>>>> return conf; >>>>> >>>>> On 4 April 2015 at 20:54, Jeetendra Gangele <gangele...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Can we get the result of the multiple scan >>>>>> from JavaSparkContext.newAPIHadoopRDD from Hbase. >>>>>> >>>>>> This method first parameter take configuration object where I have >>>>>> added filter. but how Can I query multiple scan from same table calling >>>>>> this API only once? >>>>>> >>>>>> regards >>>>>> jeetendra >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > > > >