You can try out a few tricks employed by folks at Lynx Analytics...
Daniel Darabos gave some details at Spark Summit:
https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs
On 22.7.2015. 17:00, Louis Hust wrote:
My code like below:
MapString,
You can try out a few tricks employed by folks at Lynx Analytics...
Daniel Darabos gave some details at Spark Summit:
https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs
On 22.7.2015. 17:00, Louis Hust wrote:
My code like below:
MapString,
Real-time is, of course, relative but you’ve mentioned microsecond level. Spark
is designed to process large amounts of data in a distributed fashion. No
distributed system I know of could give any kind of guarantees at the
microsecond level.
Robin
On 22 Jul 2015, at 11:14, Louis Hust
Hi, all
I am using spark jar in standalone mode, fetch data from different mysql
instance and do some action, but i found the time is at second level.
So i want to know if spark job is suitable for real time query which at
microseconds?
I do a simple test using spark in standalone mode(not cluster),
and found a simple action take a few seconds, the data size is small, just
few rows.
So each spark job will cost some time for init or prepare work no matter
what the job is?
I mean if the basic framework of spark job will cost
you can use spark rest job server(or any other solution that provides long
running spark context) so that you won't pay this bootstrap time on each
query
in addition : if you have some rdd that u want your queries to be executed
on, you can cache this rdd in memory(depends on ur cluster memory
My code like below:
MapString, String t11opt = new HashMapString, String();
t11opt.put(url, DB_URL);
t11opt.put(dbtable, t11);
DataFrame t11 = sqlContext.load(jdbc, t11opt);
t11.registerTempTable(t11);
...the same for
for real time query
I do a simple test using spark in standalone mode(not cluster),
and found a simple action take a few seconds, the data size is small, just few
rows.
So each spark job will cost some time for init or prepare work no matter what
the job is?
I mean if the basic framework of spark