Re: IDE suitable for Spark : Monitoring & Debugging Spark Jobs

2020-04-07 Thread Som Lima
The definitive guide Chapter 18: Monitoring and Debugging "This chapter covers the key details you need to monitor and debug your Spark Applications. To do this , we will walk through the spark UI with an example query designed to help you understand how to trace your own jobs through the

Re: Scala version compatibility

2020-04-07 Thread Koert Kuipers
i think it will work then assuming the callsite hasnt changed between scala versions On Mon, Apr 6, 2020 at 5:09 PM Andrew Melo wrote: > Hello, > > On Mon, Apr 6, 2020 at 3:31 PM Koert Kuipers wrote: > >> actually i might be wrong about this. did you declare scala to be a >> provided

Re: Serialization or internal functions?

2020-04-07 Thread Som Lima
Go to localhost:4040 While sparksession is running. Go to localhost:4040 Select Stages from menu option. Select Job you are interested in. You can select additional metrics Including DAG visualisation. On Tue, 7 Apr 2020, 17:14 yeikel valdes, wrote: > Thanks for your input Soma ,

Re: IDE suitable for Spark

2020-04-07 Thread Nikaash Puri
I think as long as you set the master in the code as the correct cluster URL, everything should work as expected. So you should be able to place breakpoints in IntelliJ as if running in local mode. Get Outlook for iOS From: Pat Ferrel

Re: IDE suitable for Spark

2020-04-07 Thread Pat Ferrel
IntelliJ Scala works well when debugging master=local. Has anyone used it for remote/cluster debugging? I’ve heard it is possible... From: Luiz Camargo Reply: Luiz Camargo Date: April 7, 2020 at 10:26:35 AM To: Dennis Suhari Cc: yeikel valdes , zahidr1...@gmail.com , user@spark.apache.org

Re: IDE suitable for Spark

2020-04-07 Thread Luiz Camargo
I have used IntelliJ Spark/Scala with the sbt tool On Tue, Apr 7, 2020 at 1:18 PM Dennis Suhari wrote: > We are using Pycharm resp. R Studio with Spark libraries to submit Spark > Jobs. > > Von meinem iPhone gesendet > > Am 07.04.2020 um 18:10 schrieb yeikel valdes : > >  > > Zeppelin is not

Spark Union Breaks Caching Behaviour

2020-04-07 Thread Yi Huang
Dear Community, I am a beginner of using Spark. I am confused by the comment of the following method. def union(other: Dataset[T]): Dataset[T] = withSetOperator { // This breaks caching, but it's usually ok because it addresses a very specific use case: // using union to union many files or

Re:  IDE suitable for Spark

2020-04-07 Thread Dennis Suhari
We are using Pycharm resp. R Studio with Spark libraries to submit Spark Jobs. Von meinem iPhone gesendet > Am 07.04.2020 um 18:10 schrieb yeikel valdes : > >  > > Zeppelin is not an IDE but a notebook. It is helpful to experiment but it is > missing a lot of the features that we expect

Re: IDE suitable for Spark

2020-04-07 Thread Stephen Boesch
I have been using Idea for both scala/spark and pyspark projects since 2013. It required fair amount of fiddling that first year but has been stable since early 2015. For pyspark projects only Pycharm naturally also works v well. Am Di., 7. Apr. 2020 um 09:10 Uhr schrieb yeikel valdes : > >

Re: Serialization or internal functions?

2020-04-07 Thread yeikel valdes
Thanks for your input Soma , but I am actually looking to understand the differences and not only on the performance.  On Sun, 05 Apr 2020 02:21:07 -0400 somplastic...@gmail.com wrote If you want to  measure optimisation in terms of time taken , then here is an idea  :)  

Re: IDE suitable for Spark

2020-04-07 Thread yeikel valdes
Zeppelin is not an IDE but a notebook.  It is helpful to experiment but it is missing a lot of the features that we expect from an IDE. Thanks for sharing though.  On Tue, 07 Apr 2020 04:45:33 -0400 zahidr1...@gmail.com wrote When I first logged on I asked if there was a suitable

IDE suitable for Spark

2020-04-07 Thread Zahid Rahman
When I first logged on I asked if there was a suitable IDE for Spark. I did get a couple of responses. *Thanks.* I did actually find one which is suitable IDE for spark. That is *Apache Zeppelin.* One of many reasons it is suitable for Apache Spark is. The *up and running Stage* which involves

Lifecycle of a map function

2020-04-07 Thread Vadim Vararu
Hi all, I'm trying to guess understand what is the lifecycle of a map function in spark/yarn context. My understanding is that function is instantiated on the master and then passed to each executor (serialized/deserialized). What I'd like to confirm is that the function is