Hi ,
Today i created a table with 3 regions and 2 jobtrackers but still the
spark job is taking lot of time
I also noticed one thing that is the memory of client was increasing
linearly is it like spark job was first bringing the complete data in
memory?
On Thu, Aug 7, 2014 at 7:31 PM, Ted Yu [via Apache Spark User List]
ml-node+s1001560n11651...@n3.nabble.com wrote:
Forgot to include user@
Another email from Amit indicated that there is 1 region in his table.
This wouldn't give you the benefit TableInputFormat is expected to deliver.
Please split your table into multiple regions.
See http://hbase.apache.org/book.html#d3593e6847 and related links.
Cheers
On Wed, Aug 6, 2014 at 6:41 AM, Ted Yu [hidden email]
http://user/SendEmail.jtp?type=nodenode=11651i=0 wrote:
Can you try specifying some value (100, e.g.) for
hbase.mapreduce.scan.cachedrows in your conf ?
bq. table contains 10lakh rows
How many rows are there in the table ?
nit: Example uses classOf[TableInputFormat] instead of
TableInputFormat.class.
Cheers
On Wed, Aug 6, 2014 at 5:54 AM, Amit Singh Hora [hidden email]
http://user/SendEmail.jtp?type=nodenode=11651i=1 wrote:
Hi All,
I am trying to run a SQL query on HBase using spark job ,till now i am
able
to get the desierd results but as the data set size increases Spark job
is
taking a long time
I believe i am doing something wrong,as after going through documentation
and videos discussing on spark performance it should not take more then
couple of seconds.
PFB code snippet
HBase table contains 10lakh rows
JavaPairRDDImmutableBytesWritable, Result pairRdd = ctx
.newAPIHadoopRDD(conf,
TableInputFormat.class,
ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class).cache();
JavaRDDPerson people = pairRdd
.map(new
FunctionTuple2lt;ImmutableBytesWritable, Result, Person() {
public Person
call(Tuple2ImmutableBytesWritable, Result v1)
throws Exception
{
System.out.println(comming);
Person person = new
Person();
String
key=Bytes.toString(v1._2.getRow());
key=key.substring(0,key.lastIndexOf(_));
person.setCalling(Long.parseLong(key));
person.setCalled(Bytes.toLong(v1._2.getValue(
Bytes.toBytes(si), Bytes.toBytes(called;
person.setTime(Bytes.toLong(v1._2.getValue(
Bytes.toBytes(si), Bytes.toBytes(at;
return person;
}
});
JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class);
schemaPeople.registerAsTable(people);
// SQL can be run over RDDs that have been registered as
tables.
JavaSchemaRDD teenagers = sqlCtx
.sql(SELECT count(*) from people group
by calling);
teenagers.printSchema();
I am running spark using start-all.sh script with 2 workers
Any pointers will be of a great help
Regards,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Hbase-job-taking-long-time-tp11541.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [hidden email]
http://user/SendEmail.jtp?type=nodenode=11651i=2
For additional commands, e-mail: [hidden email]
http://user/SendEmail.jtp?type=nodenode=11651i=3
--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Hbase-job-taking-long-time-tp11541p11651.html
To unsubscribe from Spark Hbase job taking long time, click here
http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=11541code=aG9yYS5hbWl0QGdtYWlsLmNvbXwxMTU0MXw4OTIzNDIwNzY=
.
NAML
http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Hbase-job-taking-long-time-tp11541p11998.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.