Thanks for the update. What about cores per executor?
On Tue, 27 Mar 2018 at 6:45 Rohit Karlupia <roh...@qubole.com> wrote: > Thanks Fawze! > > On the memory front, I am currently working on GC and CPU aware task > scheduling. I see wonderful results based on my tests so far. Once the > feature is complete and available, spark will work with whatever memory is > provided (at least enough for the largest possible task). It will also > allow you to run say 64 concurrent tasks on 8 core machine, if the nature > of tasks doesn't leads to memory or CPU contention. Essentially why worry > about tuning memory when you can let spark take care of it automatically > based on memory pressure. Will post details when we are ready. So yes we > are working on memory, but it will not be a tool but a transparent feature. > > thanks, > rohitk > > > > > On Tue, Mar 27, 2018 at 7:53 AM, Fawze Abujaber <fawz...@gmail.com> wrote: > >> Hi Rohit, >> >> I would like to thank you for the unlimited patience and support that you >> are providing here and behind the scene for all of us. >> >> The tool is amazing and easy to use and understand most of the metrics ... >> >> Thinking if we need to run it in cluster mode and all the time, i think >> we can skip it as one or few runs can give you the large picture of how the >> job is running with different configuration and it's not too much >> complicated to run it using spark-submit. >> >> I think it will be so helpful if the sparklens can also include how the >> job is running with different configuration of cores and memory, Spark job >> with 1 exec and 1 core will run different from spark job with 1 exec and 3 >> cores and for sure the same compare with different exec memory. >> >> Overall, it is so good starting point, but it will be a GAME CHANGER >> getting these metrics on the tool. >> >> @Rohit , Huge THANY YOU >> >> On Mon, Mar 26, 2018 at 1:35 PM, Rohit Karlupia <roh...@qubole.com> >> wrote: >> >>> Hi Shmuel, >>> >>> In general it is hard to pin point to exact code which is responsible >>> for a specific stage. For example when using spark sql, depending upon the >>> kind of joins, aggregations used in the the single line of query, we will >>> have multiple stages in the spark application. I usually try to split the >>> code into smaller chunks and also use the spark UI which has special >>> section for SQL. It can also show specific backtraces, but as I explained >>> earlier they might not be very helpful. Sparklens does help you ask the >>> right questions, but is not mature enough to answer all of them. >>> >>> Understanding the report: >>> >>> *1) The first part of total aggregate metrics for the application.* >>> >>> Printing application meterics..... >>> >>> AggregateMetrics (Application Metrics) total measurements 1869 >>> NAME SUM MIN >>> MAX MEAN >>> diskBytesSpilled 0.0 KB 0.0 KB >>> 0.0 KB 0.0 KB >>> executorRuntime 15.1 hh 3.0 ms >>> 4.0 mm 29.1 ss >>> inputBytesRead 26.1 GB 0.0 KB >>> 43.8 MB 14.3 MB >>> jvmGCTime 11.0 mm 0.0 ms >>> 2.1 ss 354.0 ms >>> memoryBytesSpilled 314.2 GB 0.0 KB >>> 1.1 GB 172.1 MB >>> outputBytesWritten 0.0 KB 0.0 KB >>> 0.0 KB 0.0 KB >>> peakExecutionMemory 0.0 KB 0.0 KB >>> 0.0 KB 0.0 KB >>> resultSize 12.9 MB 2.0 KB >>> 40.9 KB 7.1 KB >>> shuffleReadBytesRead 107.7 GB 0.0 KB >>> 276.0 MB 59.0 MB >>> shuffleReadFetchWaitTime 2.0 ms 0.0 ms >>> 0.0 ms 0.0 ms >>> shuffleReadLocalBlocks 2,318 0 >>> 68 1 >>> shuffleReadRecordsRead 3,413,511,099 0 >>> 8,251,926 1,826,383 >>> shuffleReadRemoteBlocks 291,126 0 >>> 824 155 >>> shuffleWriteBytesWritten 107.6 GB 0.0 KB >>> 257.6 MB 58.9 MB >>> shuffleWriteRecordsWritten 3,408,133,175 0 >>> 7,959,055 1,823,506 >>> shuffleWriteTime 8.7 mm 0.0 ms >>> 1.8 ss 278.2 ms >>> taskDuration 15.4 hh 12.0 ms >>> 4.1 mm 29.7 ss >>> >>> >>> *2) Here we show number of hosts used and executors per host. I have seen >>> users set executor memory to 33GB on a 64GB executor. Direct waste of 31GB >>> of memory.* >>> >>> Total Hosts 135 >>> >>> >>> Host server86.cluster.com startTime 02:26:21:081 executors count 3 >>> Host server164.cluster.com startTime 02:30:12:204 executors count 1 >>> Host server28.cluster.com startTime 02:31:09:023 executors count 1 >>> Host server78.cluster.com startTime 02:26:08:844 executors count 5 >>> Host server124.cluster.com startTime 02:26:10:523 executors count 3 >>> Host server100.cluster.com startTime 02:30:24:073 executors count 1 >>> Done printing host timeline >>> *3) Time at which executers were added. Not all executors are available at >>> the start of the application. * >>> >>> Printing executors timeline.... >>> Total Hosts 135 >>> Total Executors 250 >>> At 02:26 executors added 52 & removed 0 currently available 52 >>> At 02:27 executors added 10 & removed 0 currently available 62 >>> At 02:28 executors added 13 & removed 0 currently available 75 >>> At 02:29 executors added 81 & removed 0 currently available 156 >>> At 02:30 executors added 48 & removed 0 currently available 204 >>> At 02:31 executors added 45 & removed 0 currently available 249 >>> At 02:32 executors added 1 & removed 0 currently available 250 >>> >>> >>> *4) How the stages within the jobs were scheduled. Helps you understand >>> which stages ran in parallel and which are dependent on others. >>> * >>> >>> Printing Application timeline >>> 02:26:47:654 Stage 3 ended : maxTaskTime 3117 taskCount 1 >>> 02:26:47:708 Stage 4 started : duration 00m 02s >>> 02:26:49:898 Stage 4 ended : maxTaskTime 226 taskCount 200 >>> 02:26:49:901 JOB 3 ended >>> 02:26:56:234 JOB 4 started : duration 08m 28s >>> [ 5 ||||||| >>> ] >>> [ 6 ||||||||||||||||||| >>> ] >>> [ 9 |||||||| >>> ] >>> [ 10 |||||||||||||| >>> ] >>> [ 11 >>> ] >>> [ 12 || >>> ] >>> [ 13 |||| >>> ] >>> [ 14 ||||||||||||||| >>> ] >>> [ 15 >>> |||||||||||||||||||||||||||||||||||||| ] >>> 02:26:58:095 Stage 5 started : duration 00m 44s >>> 02:27:42:816 Stage 5 ended : maxTaskTime 37214 taskCount 23 >>> 02:27:03:478 Stage 6 started : duration 02m 04s >>> 02:29:07:517 Stage 6 ended : maxTaskTime 35578 taskCount 601 >>> 02:28:56:449 Stage 9 started : duration 00m 46s >>> 02:29:42:625 Stage 9 ended : maxTaskTime 7196 taskCount 200 >>> 02:27:22:343 Stage 10 started : duration 01m 33s >>> 02:28:56:333 Stage 10 ended : maxTaskTime 49203 taskCount 39 >>> 02:27:23:910 Stage 11 started : duration 00m 00s >>> 02:27:24:422 Stage 11 ended : maxTaskTime 298 taskCount 2 >>> 02:29:06:902 Stage 12 started : duration 00m 12s >>> 02:29:19:350 Stage 12 ended : maxTaskTime 11511 taskCount 200 >>> 02:29:19:413 Stage 13 started : duration 00m 25s >>> 02:29:44:444 Stage 13 ended : maxTaskTime 24924 taskCount 200 >>> 02:29:44:491 Stage 14 started : duration 01m 36s >>> 02:31:20:873 Stage 14 ended : maxTaskTime 86194 taskCount 200 >>> 02:31:20:973 Stage 15 started : duration 04m 03s >>> 02:35:24:346 Stage 15 ended : maxTaskTime 238747 taskCount 200 >>> 02:35:24:347 JOB 4 ended >>> 02:35:28:841 app ended >>> *5) I guess these metrics are well explained * >>> >>> >>> Time spent in Driver vs Executors >>> Driver WallClock Time 01m 02s 10.66% >>> Executor WallClock Time 08m 43s 89.34% >>> Total WallClock Time 09m 46s >>> >>> >>> >>> Minimum possible time for the app based on the critical path (with infinite >>> resources) 07m 59s >>> Minimum possible time for the app with same executors, perfect parallelism >>> and zero skew 02m 15s >>> If we were to run this app with single executor and single core >>> 15h 08m >>> >>> >>> Total cores available to the app 750 >>> >>> OneCoreComputeHours: Measure of total compute power available from >>> cluster. One core in the executor, running >>> for one hour, counts as one OneCoreComputeHour. >>> Executors with 4 cores, will have 4 times >>> the OneCoreComputeHours compared to one with just one >>> core. Similarly, one core executor >>> running for 4 hours will OnCoreComputeHours equal to >>> 4 core executor running for 1 hour. >>> >>> Driver Utilization (Cluster idle because of driver) >>> >>> Total OneCoreComputeHours available 122h 07m >>> Total OneCoreComputeHours available (AutoScale Aware) 77h 25m >>> OneCoreComputeHours wasted by driver 13h 01m >>> >>> AutoScale Aware: Most of the calculations by this tool will assume that >>> all executors are available throughout >>> the runtime of the application. The number above is >>> printed to show possible caution to be >>> taken in interpreting the efficiency metrics. >>> >>> Cluster Utilization (Executors idle because of lack of tasks or skew) >>> >>> Executor OneCoreComputeHours available 109h 06m >>> Executor OneCoreComputeHours used 15h 07m >>> 13.86% >>> OneCoreComputeHours wasted 93h 59m >>> 86.14% >>> >>> App Level Wastage Metrics (Driver + Executor) >>> >>> OneCoreComputeHours wasted Driver 10.66% >>> OneCoreComputeHours wasted Executor 76.96% >>> OneCoreComputeHours wasted Total 87.62% >>> >>> >>> >>> 6) *Here we use the simulation to provide answers to how the application >>> wall clock time will vary as we change the number of executors. Goal is to >>> run the application at 100% cluster utilization and minimum time. Look for >>> ROI in terms of wall clock time due to additional executors. Also if the >>> application is not scaling, this is good time to revisit application and >>> look for why it is not scaling.* >>> >>> App completion time and cluster utilization estimates with different >>> executor counts >>> >>> Real App Duration 09m 46s >>> Model Estimation 08m 01s >>> Model Error 17% >>> >>> NOTE: 1) Model error could be large when auto-scaling is enabled. >>> 2) Model doesn't handles multiple jobs run via thread-pool. For >>> better insights into >>> application scalability, please try such jobs one by one without >>> thread-pool. >>> >>> >>> Executor count 25 ( 10%) estimated time 17m 07s and estimated cluster >>> utilization 70.61% >>> Executor count 50 ( 20%) estimated time 12m 15s and estimated cluster >>> utilization 49.34% >>> Executor count 125 ( 50%) estimated time 08m 25s and estimated cluster >>> utilization 28.72% >>> Executor count 200 ( 80%) estimated time 08m 15s and estimated cluster >>> utilization 18.29% >>> Executor count 250 (100%) estimated time 08m 01s and estimated cluster >>> utilization 15.06% >>> Executor count 275 (110%) estimated time 08m 00s and estimated cluster >>> utilization 13.72% >>> Executor count 300 (120%) estimated time 07m 59s and estimated cluster >>> utilization 12.61% >>> Executor count 375 (150%) estimated time 07m 59s and estimated cluster >>> utilization 10.09% >>> Executor count 500 (200%) estimated time 07m 59s and estimated cluster >>> utilization 7.57% >>> Executor count 750 (300%) estimated time 07m 59s and estimated cluster >>> utilization 5.04% >>> Executor count 1000 (400%) estimated time 07m 59s and estimated cluster >>> utilization 3.78% >>> Executor count 1250 (500%) estimated time 07m 59s and estimated cluster >>> utilization 3.03% >>> >>> *7) These two sections are for finding out which stage are taking most of >>> the wall-clock time and why. It is either not enough parallelism or skew. >>> Parallelism is easier to fix. Fixing skew will require changing the >>> application in way that creates more uniform tasks. >>> * >>> Total tasks in all stages 1869 >>> Per Stage Utilization >>> Stage-ID Wall Task Task IO% Input Output >>> ----Shuffle----- -WallClockTime- --OneCoreComputeHours--- MaxTaskMem >>> Clock% Runtime% Count Input | >>> Output Measured | Ideal Available| Used%|Wasted% >>> 0 0.00 0.00 1 0.0 64.0 KB 0.0 KB 0.0 KB >>> 0.0 KB 00m 02s 00m 00s 00h 27m 0.0 100.0 0.0 KB >>> 1 0.00 0.00 1 0.0 64.0 KB 0.0 KB 0.0 KB >>> 0.0 KB 00m 02s 00m 00s 00h 30m 0.1 99.9 0.0 KB >>> 2 0.00 0.00 1 0.0 90.0 KB 0.0 KB 0.0 KB >>> 0.0 KB 00m 03s 00m 00s 00h 37m 0.1 99.9 0.0 KB >>> 3 0.00 0.01 1 0.0 867.1 KB 0.0 KB 0.0 KB >>> 148.4 KB 00m 04s 00m 00s 01h 01m 0.1 99.9 0.0 KB >>> 4 0.00 0.00 200 0.0 0.0 KB 0.0 KB 148.4 KB >>> 0.0 KB 00m 02s 00m 00s 00h 27m 0.1 99.9 0.0 KB >>> 5 6.00 1.15 23 0.2 402.1 MB 0.0 KB 0.0 KB >>> 1.3 GB 00m 44s 00m 00s 09h 19m 1.9 98.1 0.0 KB >>> 6 17.00 19.92 601 7.1 17.2 GB 0.0 KB 0.0 KB >>> 1.8 GB 02m 04s 00m 14s 25h 50m 11.7 88.3 0.0 KB >>> 9 6.00 0.73 200 2.9 6.9 GB 0.0 KB 409.5 MB >>> 2.8 GB 00m 46s 00m 00s 09h 37m 1.2 98.8 0.0 KB >>> 10 13.00 2.27 39 0.3 807.8 MB 0.0 KB 0.0 KB >>> 2.5 GB 01m 33s 00m 01s 19h 34m 1.7 98.3 0.0 KB >>> 11 0.00 0.00 2 0.0 31.5 KB 0.0 KB 0.0 KB >>> 60.0 KB 00m 00s 00m 00s 00h 06m 0.1 99.9 0.0 KB >>> 12 1.00 2.15 200 0.3 758.7 MB 0.0 KB 2.3 GB >>> 1.5 GB 00m 12s 00m 01s 02h 35m 12.6 87.4 0.0 KB >>> 13 3.00 5.91 200 0.0 0.0 KB 0.0 KB 1.5 GB >>> 47.5 GB 00m 25s 00m 04s 05h 12m 17.1 82.9 0.0 KB >>> 14 13.00 19.83 200 0.0 0.0 KB 0.0 KB 50.3 GB >>> 50.3 GB 01m 36s 00m 14s 20h 04m 14.9 85.1 0.0 KB >>> 15 34.00 48.02 200 0.0 0.0 KB 0.0 KB 53.2 GB >>> 0.0 KB 04m 03s 00m 34s 50h 42m 14.3 85.7 0.0 KB >>> >>> >>> Stage-ID WallClock OneCore Task PRatio -----Task------ >>> OIRatio |* ShuffleWrite% ReadFetch% GC% *| >>> Stage% ComputeHours Count Skew StageSkew >>> 0 0.32 00h 00m 1 0.00 1.00 0.37 0.00 >>> |* 0.00 0.00 15.10 *| >>> 1 0.35 00h 00m 1 0.00 1.00 0.38 0.00 >>> |* 0.00 0.00 15.56 *| >>> 2 0.43 00h 00m 1 0.00 1.00 0.45 0.00 >>> |* 0.00 0.00 8.88 *| >>> 3 0.70 00h 00m 1 0.00 1.00 0.63 0.17 >>> |* 4.51 0.00 6.74 *| >>> 4 0.31 00h 00m 200 0.27 37.67 0.10 0.00 >>> |* 0.00 0.04 23.79 *| >>> 5 6.38 00h 10m 23 0.03 1.42 0.83 3.18 >>> |* 1.08 0.00 2.72 *| >>> 6 17.68 03h 00m 601 0.80 2.07 0.29 0.10 >>> |* 0.60 0.00 1.90 *| >>> 9 6.58 00h 06m 200 0.27 5.20 0.16 0.38 >>> |* 4.74 13.24 4.04 *| >>> 10 13.40 00h 20m 39 0.05 1.67 0.52 3.17 >>> |* 1.10 0.00 1.96 *| >>> 11 0.07 00h 00m 2 0.00 1.00 0.58 1.91 >>> |* 13.59 0.00 0.00 *| >>> 12 1.77 00h 19m 200 0.27 1.99 0.92 0.50 >>> |* 1.85 19.63 3.09 *| >>> 13 3.57 00h 53m 200 0.27 1.59 1.00 31.42 >>> |* 6.06 12.25 1.33 *| >>> 14 13.74 02h 59m 200 0.27 1.65 0.89 1.00 >>> |* 1.84 2.38 0.83 *| >>> 15 34.69 07h 15m 200 0.27 1.88 0.98 0.00 >>> |* 0.00 4.21 0.88 *| >>> >>> PRatio: Number of tasks in stage divided by number of cores. >>> Represents degree of >>> parallelism in the stage >>> TaskSkew: Duration of largest task in stage divided by duration of >>> median task. >>> Represents degree of skew in the stage >>> TaskStageSkew: Duration of largest task in stage divided by total duration >>> of the stage. >>> Represents the impact of the largest task on stage time. >>> OIRatio: Output to input ration. Total output of the stage (results + >>> shuffle write) >>> divided by total input (input data + shuffle read) >>> >>> These metrics below represent distribution of time within the stage >>> >>> ShuffleWrite: Amount of time spent in shuffle writes across all tasks in >>> the given >>> stage as a percentage >>> ReadFetch: Amount of time spent in shuffle read across all tasks in the >>> given >>> stage as a percentage >>> GC: Amount of time spent in GC across all tasks in the given >>> stage as a >>> percentage >>> >>> If the stage contributes large percentage to overall application time, we >>> could look into >>> these metrics to check which part (Shuffle write, read fetch or GC is >>> responsible) >>> >>> thanks, >>> >>> rohitk >>> >>> >>> >>> On Mon, Mar 26, 2018 at 1:38 AM, Shmuel Blitz < >>> shmuel.bl...@similarweb.com> wrote: >>> >>>> Hi Rohit, >>>> >>>> Thanks for the analysis. >>>> >>>> I can use repartition on the slow task. But how can I tell what part of >>>> the code is in charge of the slow tasks? >>>> >>>> It would be great if you could further explain the rest of the output. >>>> >>>> Thanks in advance, >>>> Shmuel >>>> >>>> On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia <roh...@qubole.com> >>>> wrote: >>>> >>>>> Thanks Shamuel for trying out sparklens! >>>>> >>>>> Couple of things that I noticed: >>>>> 1) 250 executors is probably overkill for this job. It would run in >>>>> same time with around 100. >>>>> 2) Many of stages that take long time have only 200 tasks where as we >>>>> have 750 cores available for the job. 200 is the default value for >>>>> spark.sql.shuffle.partitions. Alternatively you could try increasing >>>>> the value of spark.sql.shuffle.partitions to latest 750. >>>>> >>>>> thanks, >>>>> rohitk >>>>> >>>>> On Sun, Mar 25, 2018 at 1:25 PM, Shmuel Blitz < >>>>> shmuel.bl...@similarweb.com> wrote: >>>>> >>>>>> I ran it on a single job. >>>>>> SparkLens has an overhead on the job duration. I'm not ready to >>>>>> enable it by default on all our jobs. >>>>>> >>>>>> Attached is the output. >>>>>> >>>>>> Still trying to understand what exactly it means. >>>>>> >>>>>> On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber <fawz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Nice! >>>>>>> >>>>>>> Shmuel, Were you able to run on a cluster level or for a specific >>>>>>> job? >>>>>>> >>>>>>> Did you configure it on the spark-default.conf? >>>>>>> >>>>>>> On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz < >>>>>>> shmuel.bl...@similarweb.com> wrote: >>>>>>> >>>>>>>> Just to let you know, I have managed to run SparkLens on our >>>>>>>> cluster. >>>>>>>> >>>>>>>> I switched to the spark_1.6 branch, and also compiled against the >>>>>>>> specific image of Spark we are using (cdh5.7.6). >>>>>>>> >>>>>>>> Now I need to figure out what the output means... :P >>>>>>>> >>>>>>>> Shmuel >>>>>>>> >>>>>>>> On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber <fawz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Quick question: >>>>>>>>> >>>>>>>>> how to add the --jars /path/to/sparklens_2.11-0.1.0.jar to the >>>>>>>>> spark-default conf, should it be using: >>>>>>>>> >>>>>>>>> spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or >>>>>>>>> i should use spark.jars option? anyone who could give an example how >>>>>>>>> it >>>>>>>>> should be, and if i the path for the jar should be an hdfs path as i'm >>>>>>>>> using it in cluster mode. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Mar 23, 2018 at 6:33 AM, Fawze Abujaber <fawz...@gmail.com >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Shmuel, >>>>>>>>>> >>>>>>>>>> Did you compile the code against the right branch for Spark 1.6. >>>>>>>>>> >>>>>>>>>> I tested it and it looks working and now i'm testing the branch >>>>>>>>>> for a wide tests, Please use the branch for Spark 1.6 >>>>>>>>>> >>>>>>>>>> On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz < >>>>>>>>>> shmuel.bl...@similarweb.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> Thanks for sharing this great tool. >>>>>>>>>>> I tried running a spark job with the tool, but it failed with an >>>>>>>>>>> *IncompatibleClassChangeError >>>>>>>>>>> *Exception. >>>>>>>>>>> >>>>>>>>>>> I have opened an issue on Github.( >>>>>>>>>>> https://github.com/qubole/sparklens/issues/1) >>>>>>>>>>> >>>>>>>>>>> Shmuel >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz < >>>>>>>>>>> shmuel.bl...@similarweb.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> We will give this a try and report back. >>>>>>>>>>>> >>>>>>>>>>>> Shmuel >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia < >>>>>>>>>>>> roh...@qubole.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks everyone! >>>>>>>>>>>>> Please share how it works and how it doesn't. Both help. >>>>>>>>>>>>> >>>>>>>>>>>>> Fawaze, just made few changes to make this work with spark >>>>>>>>>>>>> 1.6. Can you please try building from branch *spark_1.6* >>>>>>>>>>>>> >>>>>>>>>>>>> thanks, >>>>>>>>>>>>> rohitk >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber < >>>>>>>>>>>>> fawz...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> It's super amazing .... i see it was tested on spark 2.0.0 >>>>>>>>>>>>>> and above, what about Spark 1.6 which is still part of >>>>>>>>>>>>>> Cloudera's main >>>>>>>>>>>>>> versions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have a vast Spark applications with version 1.6.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau < >>>>>>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Super exciting! I look forward to digging through it this >>>>>>>>>>>>>>> weekend. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) < >>>>>>>>>>>>>>> ravishankar.n...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Excellent. You filled a missing link. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Passion >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia < >>>>>>>>>>>>>>>> roh...@qubole.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Happy to announce the availability of Sparklens as open >>>>>>>>>>>>>>>>> source project. It helps in understanding the scalability >>>>>>>>>>>>>>>>> limits of spark >>>>>>>>>>>>>>>>> applications and can be a useful guide on the path towards >>>>>>>>>>>>>>>>> tuning >>>>>>>>>>>>>>>>> applications for lower runtime or cost. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please clone from here: >>>>>>>>>>>>>>>>> https://github.com/qubole/sparklens >>>>>>>>>>>>>>>>> Old blogpost: >>>>>>>>>>>>>>>>> https://www.qubole.com/blog/introducing-quboles-spark-tuning-tool/ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>> rohitk >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PS: Thanks for the patience. It took couple of months to >>>>>>>>>>>>>>>>> get back on this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Shmuel Blitz >>>>>>>>>>>> Big Data Developer >>>>>>>>>>>> Email: shmuel.bl...@similarweb.com >>>>>>>>>>>> www.similarweb.com >>>>>>>>>>>> <https://www.facebook.com/SimilarWeb/> >>>>>>>>>>>> <https://www.linkedin.com/company/429838/> >>>>>>>>>>>> <https://twitter.com/similarweb> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Shmuel Blitz >>>>>>>>>>> Big Data Developer >>>>>>>>>>> Email: shmuel.bl...@similarweb.com >>>>>>>>>>> www.similarweb.com >>>>>>>>>>> <https://www.facebook.com/SimilarWeb/> >>>>>>>>>>> <https://www.linkedin.com/company/429838/> >>>>>>>>>>> <https://twitter.com/similarweb> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Shmuel Blitz >>>>>>>> Big Data Developer >>>>>>>> Email: shmuel.bl...@similarweb.com >>>>>>>> www.similarweb.com >>>>>>>> <https://www.facebook.com/SimilarWeb/> >>>>>>>> <https://www.linkedin.com/company/429838/> >>>>>>>> <https://twitter.com/similarweb> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Shmuel Blitz >>>>>> Big Data Developer >>>>>> Email: shmuel.bl...@similarweb.com >>>>>> www.similarweb.com >>>>>> <https://www.facebook.com/SimilarWeb/> >>>>>> <https://www.linkedin.com/company/429838/> >>>>>> <https://twitter.com/similarweb> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Shmuel Blitz >>>> Big Data Developer >>>> Email: shmuel.bl...@similarweb.com >>>> www.similarweb.com >>>> <https://www.facebook.com/SimilarWeb/> >>>> <https://www.linkedin.com/company/429838/> >>>> <https://twitter.com/similarweb> >>>> >>> >>> >> >> >> -- >> Take Care >> Fawze Abujaber >> > > -- Take Care Fawze Abujaber