Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic


You can try out a few tricks employed by folks at Lynx Analytics... 
Daniel Darabos gave some details at Spark Summit:

https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs


On 22.7.2015. 17:00, Louis Hust wrote:

My code like below:
MapString, String t11opt = new HashMapString, String();
t11opt.put(url, DB_URL);
t11opt.put(dbtable, t11);
DataFrame t11 = sqlContext.load(jdbc, t11opt);
t11.registerTempTable(t11);

...the same for t12, t21, t22


DataFrame t1 = t11.unionAll(t12);
t1.registerTempTable(t1);
DataFrame t2 = t21.unionAll(t22);
t2.registerTempTable(t2);
for (int i = 0; i  10; i ++) {
System.out.println(new Date(System.currentTimeMillis()));
DataFrame crossjoin = sqlContext.sql(select txt from 
t1 join t2 on t1.id http://t1.id = t2.id http://t2.id);

crossjoin.show();
System.out.println(new Date(System.currentTimeMillis()));
}

Where t11,t12, t21,t22 are all table dataframe load from jdbc  of 
mysql database which is at local with the spark job.


But each loop execute about 3 seconds. i do not know why cost so many 
time?





2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk 
mailto:robin.e...@xense.co.uk:


Here’s an example using spark-shell on my laptop:

sc.textFile(LICENSE).filter(_ contains Spark).count

This takes less than a second the first time I run it and is
instantaneous on every subsequent run.

What code are you running?



On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com
mailto:louis.h...@gmail.com wrote:

I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is
small, just few rows.
So each spark job will cost some time for init or prepare work no
matter what the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk
mailto:robin.e...@xense.co.uk:

Real-time is, of course, relative but you’ve mentioned
microsecond level. Spark is designed to process large amounts
of data in a distributed fashion. No distributed system I
know of could give any kind of guarantees at the microsecond
level.

Robin

 On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com
mailto:louis.h...@gmail.com wrote:

 Hi, all

 I am using spark jar in standalone mode, fetch data from
different mysql instance and do some action, but i found the
time is at second level.

 So i want to know if spark job is suitable for real time
query which at microseconds?









Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic


You can try out a few tricks employed by folks at Lynx Analytics... 
Daniel Darabos gave some details at Spark Summit:

https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs


On 22.7.2015. 17:00, Louis Hust wrote:

My code like below:
MapString, String t11opt = new HashMapString, String();
t11opt.put(url, DB_URL);
t11opt.put(dbtable, t11);
DataFrame t11 = sqlContext.load(jdbc, t11opt);
t11.registerTempTable(t11);

...the same for t12, t21, t22


DataFrame t1 = t11.unionAll(t12);
t1.registerTempTable(t1);
DataFrame t2 = t21.unionAll(t22);
t2.registerTempTable(t2);
for (int i = 0; i  10; i ++) {
System.out.println(new Date(System.currentTimeMillis()));
DataFrame crossjoin = sqlContext.sql(select txt from 
t1 join t2 on t1.id http://t1.id = t2.id http://t2.id);

crossjoin.show();
System.out.println(new Date(System.currentTimeMillis()));
}

Where t11,t12, t21,t22 are all table dataframe load from jdbc  of 
mysql database which is at local with the spark job.


But each loop execute about 3 seconds. i do not know why cost so many 
time?





2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk 
mailto:robin.e...@xense.co.uk:


Here’s an example using spark-shell on my laptop:

sc.textFile(LICENSE).filter(_ contains Spark).count

This takes less than a second the first time I run it and is
instantaneous on every subsequent run.

What code are you running?



On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com
mailto:louis.h...@gmail.com wrote:

I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is
small, just few rows.
So each spark job will cost some time for init or prepare work no
matter what the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk
mailto:robin.e...@xense.co.uk:

Real-time is, of course, relative but you’ve mentioned
microsecond level. Spark is designed to process large amounts
of data in a distributed fashion. No distributed system I
know of could give any kind of guarantees at the microsecond
level.

Robin

 On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com
mailto:louis.h...@gmail.com wrote:

 Hi, all

 I am using spark jar in standalone mode, fetch data from
different mysql instance and do some action, but i found the
time is at second level.

 So i want to know if spark job is suitable for real time
query which at microseconds?









Re: Is spark suitable for real time query

2015-07-22 Thread Robin East
Real-time is, of course, relative but you’ve mentioned microsecond level. Spark 
is designed to process large amounts of data in a distributed fashion. No 
distributed system I know of could give any kind of guarantees at the 
microsecond level.

Robin

 On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote:
 
 Hi, all
 
 I am using spark jar in standalone mode, fetch data from different mysql 
 instance and do some action, but i found the time is at second level.
 
 So i want to know if spark job is suitable for real time query which at 
 microseconds?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Is spark suitable for real time query

2015-07-22 Thread Louis Hust
Hi, all

I am using spark jar in standalone mode, fetch data from different mysql
instance and do some action, but i found the time is at second level.

So i want to know if spark job is suitable for real time query which at
microseconds?


Re: Is spark suitable for real time query

2015-07-22 Thread Louis Hust
I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is small, just
few rows.
So each spark job will cost some time for init or prepare work no matter
what the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk:

 Real-time is, of course, relative but you’ve mentioned microsecond level.
 Spark is designed to process large amounts of data in a distributed
 fashion. No distributed system I know of could give any kind of guarantees
 at the microsecond level.

 Robin

  On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote:
 
  Hi, all
 
  I am using spark jar in standalone mode, fetch data from different mysql
 instance and do some action, but i found the time is at second level.
 
  So i want to know if spark job is suitable for real time query which at
 microseconds?




Re: Is spark suitable for real time query

2015-07-22 Thread Igor Berman
you can use spark rest job server(or any other solution that provides long
running spark context) so that you won't pay this bootstrap time on each
query
in addition : if you have some rdd that u want your queries to be executed
on, you can cache this rdd in memory(depends on ur cluster memory size) so
that you wont pay reading from disk time


On 22 July 2015 at 14:46, Louis Hust louis.h...@gmail.com wrote:

 I do a simple test using spark in standalone mode(not cluster),
  and found a simple action take a few seconds, the data size is small,
 just few rows.
 So each spark job will cost some time for init or prepare work no matter
 what the job is?
 I mean if the basic framework of spark job will cost seconds?

 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk:

 Real-time is, of course, relative but you’ve mentioned microsecond level.
 Spark is designed to process large amounts of data in a distributed
 fashion. No distributed system I know of could give any kind of guarantees
 at the microsecond level.

 Robin

  On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote:
 
  Hi, all
 
  I am using spark jar in standalone mode, fetch data from different
 mysql instance and do some action, but i found the time is at second level.
 
  So i want to know if spark job is suitable for real time query which at
 microseconds?





Re: Is spark suitable for real time query

2015-07-22 Thread Louis Hust
My code like below:
MapString, String t11opt = new HashMapString, String();
t11opt.put(url, DB_URL);
t11opt.put(dbtable, t11);
DataFrame t11 = sqlContext.load(jdbc, t11opt);
t11.registerTempTable(t11);

...the same for t12, t21, t22



DataFrame t1 = t11.unionAll(t12);
t1.registerTempTable(t1);
DataFrame t2 = t21.unionAll(t22);
t2.registerTempTable(t2);
for (int i = 0; i  10; i ++) {
System.out.println(new Date(System.currentTimeMillis()));
DataFrame crossjoin = sqlContext.sql(select txt from t1
join t2 on t1.id = t2.id);
crossjoin.show();
System.out.println(new Date(System.currentTimeMillis()));
}

Where t11,t12, t21,t22 are all table dataframe load from jdbc  of mysql
database which is at local with the spark job.

But each loop execute about 3 seconds. i do not know why cost so many time?




2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk:

 Here’s an example using spark-shell on my laptop:

 sc.textFile(LICENSE).filter(_ contains Spark).count

 This takes less than a second the first time I run it and is instantaneous
 on every subsequent run.

 What code are you running?


 On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com wrote:

 I do a simple test using spark in standalone mode(not cluster),
  and found a simple action take a few seconds, the data size is small,
 just few rows.
 So each spark job will cost some time for init or prepare work no matter
 what the job is?
 I mean if the basic framework of spark job will cost seconds?

 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk:

 Real-time is, of course, relative but you’ve mentioned microsecond level.
 Spark is designed to process large amounts of data in a distributed
 fashion. No distributed system I know of could give any kind of guarantees
 at the microsecond level.

 Robin

  On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote:
 
  Hi, all
 
  I am using spark jar in standalone mode, fetch data from different
 mysql instance and do some action, but i found the time is at second level.
 
  So i want to know if spark job is suitable for real time query which at
 microseconds?






R: Is spark suitable for real time query

2015-07-22 Thread Paolo Platter
Are you using jdbc server?

Paolo

Inviata dal mio Windows Phone

Da: Louis Hustmailto:louis.h...@gmail.com
Inviato: ‎22/‎07/‎2015 13:47
A: Robin Eastmailto:robin.e...@xense.co.uk
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: Is spark suitable for real time query

I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is small, just few 
rows.
So each spark job will cost some time for init or prepare work no matter what 
the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East 
robin.e...@xense.co.ukmailto:robin.e...@xense.co.uk:
Real-time is, of course, relative but you’ve mentioned microsecond level. Spark 
is designed to process large amounts of data in a distributed fashion. No 
distributed system I know of could give any kind of guarantees at the 
microsecond level.

Robin

 On 22 Jul 2015, at 11:14, Louis Hust 
 louis.h...@gmail.commailto:louis.h...@gmail.com wrote:

 Hi, all

 I am using spark jar in standalone mode, fetch data from different mysql 
 instance and do some action, but i found the time is at second level.

 So i want to know if spark job is suitable for real time query which at 
 microseconds?