For your second question: hql() (as well as sql()) does not launch a
Spark job immediately; instead, it fires off the Spark SQL
parser/optimizer/planner pipeline first, and a Spark job will be
started after the a physical execution plan is selected. Therefore,
your hand-rolled end-to-end measurement includes the time to go
through the Spark SQL code path, and the times reported inside the UI
are the execution times of the Spark job(s) only.

On Mon, Sep 1, 2014 at 11:45 PM, Niranda Perera <nira...@wso2.com> wrote:
> Hi,
>
> I have been playing around with spark for a couple of days. I am
> using spark-1.0.1-bin-hadoop1 and the Java API. The main idea of the
> implementation is to run Hive queries on Spark. I used JavaHiveContext to
> achieve this (As per the examples).
>
> I have 2 questions.
> 1. I am wondering how I could get the execution times of a spark job? Does
> Spark provide monitoring facilities in the form of an API?
>
> 2. I used a laymen way to get the execution times by enclosing a
> JavaHiveContext.hql method with System.nanoTime() as follows
>
> long start, end;
> JavaHiveContext hiveCtx;
> JavaSchemaRDD hiveResult;
>
> start = System.nanoTime();
> hiveResult = hiveCtx.hql(query);
> end = System.nanoTime();
> System.out.println(start-end);
>
> But the result I got is drastically different from the execution times
> recorded in SparkUI. Can you please explain this disparity?
>
> Look forward to hearing from you.
>
> rgds
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to