Forgot to include user@ Another email from Amit indicated that there is 1 region in his table. This wouldn't give you the benefit TableInputFormat is expected to deliver.
Please split your table into multiple regions. See http://hbase.apache.org/book.html#d3593e6847 and related links. Cheers On Wed, Aug 6, 2014 at 6:41 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you try specifying some value (100, e.g.) for > "hbase.mapreduce.scan.cachedrows" in your conf ? > > bq. table contains 10lakh rows > > How many rows are there in the table ? > > nit: Example uses classOf[TableInputFormat] instead of > TableInputFormat.class. > > Cheers > > > On Wed, Aug 6, 2014 at 5:54 AM, Amit Singh Hora <hora.a...@gmail.com> > wrote: > >> Hi All, >> >> I am trying to run a SQL query on HBase using spark job ,till now i am >> able >> to get the desierd results but as the data set size increases Spark job is >> taking a long time >> I believe i am doing something wrong,as after going through documentation >> and videos discussing on spark performance it should not take more then >> couple of seconds. >> >> PFB code snippet >> HBase table contains 10lakh rows >> >> JavaPairRDD<ImmutableBytesWritable, Result> pairRdd = ctx >> .newAPIHadoopRDD(conf, >> TableInputFormat.class, >> >> ImmutableBytesWritable.class, >> >> org.apache.hadoop.hbase.client.Result.class).cache(); >> >> JavaRDD<Person> people = pairRdd >> .map(new >> Function<Tuple2<ImmutableBytesWritable, Result>, Person>() { >> >> public Person >> call(Tuple2<ImmutableBytesWritable, Result> v1) >> throws Exception { >> >> System.out.println("comming"); >> Person person = new >> Person(); >> String >> key=Bytes.toString(v1._2.getRow()); >> >> key=key.substring(0,key.lastIndexOf("_")); >> >> person.setCalling(Long.parseLong(key)); >> >> person.setCalled(Bytes.toLong(v1._2.getValue( >> >> Bytes.toBytes("si"), Bytes.toBytes("called")))); >> >> person.setTime(Bytes.toLong(v1._2.getValue( >> >> Bytes.toBytes("si"), Bytes.toBytes("at")))); >> >> return person; >> } >> }); >> JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class); >> schemaPeople.registerAsTable("people"); >> >> // SQL can be run over RDDs that have been registered as >> tables. >> JavaSchemaRDD teenagers = sqlCtx >> .sql("SELECT count(*) from people group >> by calling"); >> teenagers.printSchema(); >> >> >> I am running spark using start-all.sh script with 2 workers >> >> Any pointers will be of a great help >> Regards, >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Hbase-job-taking-long-time-tp11541.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >