Re: Is spark suitable for real time query
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString, String t11opt = new HashMapString, String(); t11opt.put(url, DB_URL); t11opt.put(dbtable, t11); DataFrame t11 = sqlContext.load(jdbc, t11opt); t11.registerTempTable(t11); ...the same for t12, t21, t22 DataFrame t1 = t11.unionAll(t12); t1.registerTempTable(t1); DataFrame t2 = t21.unionAll(t22); t2.registerTempTable(t2); for (int i = 0; i 10; i ++) { System.out.println(new Date(System.currentTimeMillis())); DataFrame crossjoin = sqlContext.sql(select txt from t1 join t2 on t1.id http://t1.id = t2.id http://t2.id); crossjoin.show(); System.out.println(new Date(System.currentTimeMillis())); } Where t11,t12, t21,t22 are all table dataframe load from jdbc of mysql database which is at local with the spark job. But each loop execute about 3 seconds. i do not know why cost so many time? 2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk mailto:robin.e...@xense.co.uk: Here’s an example using spark-shell on my laptop: sc.textFile(LICENSE).filter(_ contains Spark).count This takes less than a second the first time I run it and is instantaneous on every subsequent run. What code are you running? On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com mailto:louis.h...@gmail.com wrote: I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk mailto:robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com mailto:louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
Re: Is spark suitable for real time query
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString, String t11opt = new HashMapString, String(); t11opt.put(url, DB_URL); t11opt.put(dbtable, t11); DataFrame t11 = sqlContext.load(jdbc, t11opt); t11.registerTempTable(t11); ...the same for t12, t21, t22 DataFrame t1 = t11.unionAll(t12); t1.registerTempTable(t1); DataFrame t2 = t21.unionAll(t22); t2.registerTempTable(t2); for (int i = 0; i 10; i ++) { System.out.println(new Date(System.currentTimeMillis())); DataFrame crossjoin = sqlContext.sql(select txt from t1 join t2 on t1.id http://t1.id = t2.id http://t2.id); crossjoin.show(); System.out.println(new Date(System.currentTimeMillis())); } Where t11,t12, t21,t22 are all table dataframe load from jdbc of mysql database which is at local with the spark job. But each loop execute about 3 seconds. i do not know why cost so many time? 2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk mailto:robin.e...@xense.co.uk: Here’s an example using spark-shell on my laptop: sc.textFile(LICENSE).filter(_ contains Spark).count This takes less than a second the first time I run it and is instantaneous on every subsequent run. What code are you running? On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com mailto:louis.h...@gmail.com wrote: I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk mailto:robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com mailto:louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
Re: Is spark suitable for real time query
Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Is spark suitable for real time query
Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
Re: Is spark suitable for real time query
I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
Re: Is spark suitable for real time query
you can use spark rest job server(or any other solution that provides long running spark context) so that you won't pay this bootstrap time on each query in addition : if you have some rdd that u want your queries to be executed on, you can cache this rdd in memory(depends on ur cluster memory size) so that you wont pay reading from disk time On 22 July 2015 at 14:46, Louis Hust louis.h...@gmail.com wrote: I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
Re: Is spark suitable for real time query
My code like below: MapString, String t11opt = new HashMapString, String(); t11opt.put(url, DB_URL); t11opt.put(dbtable, t11); DataFrame t11 = sqlContext.load(jdbc, t11opt); t11.registerTempTable(t11); ...the same for t12, t21, t22 DataFrame t1 = t11.unionAll(t12); t1.registerTempTable(t1); DataFrame t2 = t21.unionAll(t22); t2.registerTempTable(t2); for (int i = 0; i 10; i ++) { System.out.println(new Date(System.currentTimeMillis())); DataFrame crossjoin = sqlContext.sql(select txt from t1 join t2 on t1.id = t2.id); crossjoin.show(); System.out.println(new Date(System.currentTimeMillis())); } Where t11,t12, t21,t22 are all table dataframe load from jdbc of mysql database which is at local with the spark job. But each loop execute about 3 seconds. i do not know why cost so many time? 2015-07-22 19:52 GMT+08:00 Robin East robin.e...@xense.co.uk: Here’s an example using spark-shell on my laptop: sc.textFile(LICENSE).filter(_ contains Spark).count This takes less than a second the first time I run it and is instantaneous on every subsequent run. What code are you running? On 22 Jul 2015, at 12:34, Louis Hust louis.h...@gmail.com wrote: I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?
R: Is spark suitable for real time query
Are you using jdbc server? Paolo Inviata dal mio Windows Phone Da: Louis Hustmailto:louis.h...@gmail.com Inviato: 22/07/2015 13:47 A: Robin Eastmailto:robin.e...@xense.co.uk Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: Is spark suitable for real time query I do a simple test using spark in standalone mode(not cluster), and found a simple action take a few seconds, the data size is small, just few rows. So each spark job will cost some time for init or prepare work no matter what the job is? I mean if the basic framework of spark job will cost seconds? 2015-07-22 19:17 GMT+08:00 Robin East robin.e...@xense.co.ukmailto:robin.e...@xense.co.uk: Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level. Robin On 22 Jul 2015, at 11:14, Louis Hust louis.h...@gmail.commailto:louis.h...@gmail.com wrote: Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?