Re: measure running time

2021-12-24 Thread bitfox
Cc user、Luca Canali Subject Re: measure running time Hi Sean, I have already discussed an issue in my case with Spark 3.1.1 and sparkmeasure with the author Luca Canali on this matter. It has been reproduced. I think we ought to wait for a p

Re: measure running time

2021-12-24 Thread Hollis
、Luca Canali | | Subject | Re: measure running time | Hi Sean, I have already discussed an issue in my case with Spark 3.1.1 and sparkmeasure with the author Luca Canali on this matter. It has been reproduced. I think we ought to wait for a patch. HTH, Mich view my Linkedin

Re: measure running time

2021-12-24 Thread Mich Talebzadeh
; approach that may lead you to miss important details, in >> > particular >> >>>> when running distributed computations. >> >>>> >> >>>> WebUI, REST API, and metrics instrumentation in Spark can be quite >> >>>> useful for f

Re: measure running time

2021-12-24 Thread Sean Owen
important details, in > > particular > >>>> when running distributed computations. > >>>> > >>>> WebUI, REST API, and metrics instrumentation in Spark can be quite > >>>> useful for further drill down. See > >>>> https://spark.a

Re: measure running time

2021-12-24 Thread Gourav Sengupta
Hi, There are too many blogs out there with absolutely no value. Before writing another blog, which does not make much sense by doing run time comparisons between RDD and dataframes (as stated earlier), it may be useful to first understand what you are trying to achieve by writing this blog.

Re: measure running time

2021-12-24 Thread bitfox
As you see below: $ pip install sparkmeasure Collecting sparkmeasure Using cached https://files.pythonhosted.org/packages/9f/bf/c9810ff2d88513ffc185e65a3ab9df6121ad5b4c78aa8d134a06177f9021/sparkmeasure-0.14.0-py2.py3-none-any.whl Installing collected packages: sparkmeasure Successfully

Re: measure running time

2021-12-24 Thread bitfox
can also have a look at this tool that takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user Subject: Re: measure run

Re:Re: measure running time

2021-12-24 Thread Hollis
ing and aggregating some executor task metrics: >>> https://github.com/LucaCanali/sparkMeasure >>> >>> Best, >>> >>> Luca >>> >>> From: Gourav Sengupta >>> Sent: Thursday, December 23, 2021 14:23 >>> To: bit

Re: measure running time

2021-12-23 Thread bitfox
ave a look at this tool that takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user Subject: Re: measure running time

Re: measure running time

2021-12-23 Thread bitfox
://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user Subject: Re: measure running time Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
> > bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 > > > > Best, > > Luca > > > > *From:* Mich Talebzadeh > *Sent:* Thursday, December 23, 2021 19:59 > *To:* Luca Canali > *Cc:* user > *Subject:* Re: measure running time >

RE: measure running time

2021-12-23 Thread Luca Canali
takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta mailto:gourav.sengu...@gmail.com> > Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user mailto:user@spark.apa

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
or task metrics: > https://github.com/LucaCanali/sparkMeasure > > > > Best, > > Luca > > > > *From:* Gourav Sengupta > *Sent:* Thursday, December 23, 2021 14:23 > *To:* bit...@bitfox.top > *Cc:* user > *Subject:* Re: measure running time > >

RE: measure running time

2021-12-23 Thread Luca Canali
To: bit...@bitfox.top Cc: user Subject: Re: measure running time Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation in RDD and Dataframe can be compared based on their start and stop time may not provide any valid

Re: measure running time

2021-12-23 Thread Gourav Sengupta
Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation in RDD and Dataframe can be compared based on their start and stop time may not provide any valid information. You will have to look into the details of timing and the

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
Try this simple thing first import time def main(): start_time = time.time() print("\nStarted at");uf.println(lst) # your code print("\nFinished at");uf.println(lst) end_time = time.time() time_elapsed = (end_time - start_time) print(f"""Elapsed time in seconds is

measure running time

2021-12-23 Thread bitfox
hello community, In pyspark how can I measure the running time to the command? I just want to compare the running time of the RDD API and dataframe API, in my this blog: https://bitfoxtop.wordpress.com/2021/12/23/count-email-addresses-using-sparks-rdd-and-dataframe/ I tried spark.time() it