My code like below:
            Map<String, String> t11opt = new HashMap<String, String>();
            t11opt.put("url", DB_URL);
            t11opt.put("dbtable", "t11");
            DataFrame t11 = sqlContext.load("jdbc", t11opt);
            t11.registerTempTable("t11");

            .......the same for t12, t21, t22



            DataFrame t1 = t11.unionAll(t12);
            t1.registerTempTable("t1");
            DataFrame t2 = t21.unionAll(t22);
            t2.registerTempTable("t2");
            for (int i = 0; i < 10; i ++) {
                System.out.println(new Date(System.currentTimeMillis()));
                DataFrame crossjoin = sqlContext.sql("select txt from t1
join t2 on t1.id = t2.id");
                crossjoin.show();
                System.out.println(new Date(System.currentTimeMillis()));
            }

Where t11,t12, t21,t22 are all table dataframe load from jdbc  of mysql
database which is at local with the spark job.

But each loop execute about 3 seconds. i do not know why cost so many time?




2015-07-22 19:52 GMT+08:00 Robin East <robin.e...@xense.co.uk>:

> Here’s an example using spark-shell on my laptop:
>
> sc.textFile("LICENSE").filter(_ contains "Spark").count
>
> This takes less than a second the first time I run it and is instantaneous
> on every subsequent run.
>
> What code are you running?
>
>
> On 22 Jul 2015, at 12:34, Louis Hust <louis.h...@gmail.com> wrote:
>
> I do a simple test using spark in standalone mode(not cluster),
>  and found a simple action take a few seconds, the data size is small,
> just few rows.
> So each spark job will cost some time for init or prepare work no matter
> what the job is?
> I mean if the basic framework of spark job will cost seconds?
>
> 2015-07-22 19:17 GMT+08:00 Robin East <robin.e...@xense.co.uk>:
>
>> Real-time is, of course, relative but you’ve mentioned microsecond level.
>> Spark is designed to process large amounts of data in a distributed
>> fashion. No distributed system I know of could give any kind of guarantees
>> at the microsecond level.
>>
>> Robin
>>
>> > On 22 Jul 2015, at 11:14, Louis Hust <louis.h...@gmail.com> wrote:
>> >
>> > Hi, all
>> >
>> > I am using spark jar in standalone mode, fetch data from different
>> mysql instance and do some action, but i found the time is at second level.
>> >
>> > So i want to know if spark job is suitable for real time query which at
>> microseconds?
>>
>>
>
>

Reply via email to