My code like below: Map<String, String> t11opt = new HashMap<String, String>(); t11opt.put("url", DB_URL); t11opt.put("dbtable", "t11"); DataFrame t11 = sqlContext.load("jdbc", t11opt); t11.registerTempTable("t11");
.......the same for t12, t21, t22 DataFrame t1 = t11.unionAll(t12); t1.registerTempTable("t1"); DataFrame t2 = t21.unionAll(t22); t2.registerTempTable("t2"); for (int i = 0; i < 10; i ++) { System.out.println(new Date(System.currentTimeMillis())); DataFrame crossjoin = sqlContext.sql("select txt from t1 join t2 on t1.id = t2.id"); crossjoin.show(); System.out.println(new Date(System.currentTimeMillis())); } Where t11,t12, t21,t22 are all table dataframe load from jdbc of mysql database which is at local with the spark job. But each loop execute about 3 seconds. i do not know why cost so many time? 2015-07-22 19:52 GMT+08:00 Robin East <robin.e...@xense.co.uk>: > Here’s an example using spark-shell on my laptop: > > sc.textFile("LICENSE").filter(_ contains "Spark").count > > This takes less than a second the first time I run it and is instantaneous > on every subsequent run. > > What code are you running? > > > On 22 Jul 2015, at 12:34, Louis Hust <louis.h...@gmail.com> wrote: > > I do a simple test using spark in standalone mode(not cluster), > and found a simple action take a few seconds, the data size is small, > just few rows. > So each spark job will cost some time for init or prepare work no matter > what the job is? > I mean if the basic framework of spark job will cost seconds? > > 2015-07-22 19:17 GMT+08:00 Robin East <robin.e...@xense.co.uk>: > >> Real-time is, of course, relative but you’ve mentioned microsecond level. >> Spark is designed to process large amounts of data in a distributed >> fashion. No distributed system I know of could give any kind of guarantees >> at the microsecond level. >> >> Robin >> >> > On 22 Jul 2015, at 11:14, Louis Hust <louis.h...@gmail.com> wrote: >> > >> > Hi, all >> > >> > I am using spark jar in standalone mode, fetch data from different >> mysql instance and do some action, but i found the time is at second level. >> > >> > So i want to know if spark job is suitable for real time query which at >> microseconds? >> >> > >