date:20231211

Cluster-mode job compute-time/cost metrics

2023-12-11 Thread Jack Wells

Hello Spark experts - I’m running Spark jobs in cluster mode using a dedicated cluster for each job. Is there a way to see how much compute time each job takes via Spark APIs, metrics, etc.? In case it makes a difference, I’m using AWS EMR - I’d ultimately like to be able to say this job costs $X

Unsubscribe

2023-12-11 Thread 18706753459

Unsubscribe

Unsubscribe

2023-12-11 Thread Dusty Williams

Unsubscribe

unsubscribe

2023-12-11 Thread Stevens, Clay

unsubscribe

Spark 3.1.3 with Hive dynamic partitions fails while driver moves the staged files

2023-12-11 Thread Shay Elbaz

Hi all, Running on Dataproc 2.0/1.3/1.4, we use INSERT INTO OVERWRITE command to insert new (time) partitions into existing Hive tables. But we see too many failures coming from org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles. This is where the driver moves the successful files from

unsubscribe

2023-12-11 Thread Sergey Boytsov

-- Sergei Boitsov JetBrains GmbH Christoph-Rapparini-Bogen 23 80639 München Handelsregister: Amtsgericht München, HRB 187151 Geschäftsführer: Yury Belyaev

unsubscribe

2023-12-11 Thread Klaus Schaefers

-- “Overfitting” is not about an excessive amount of physical exercise...

Re: [EXTERNAL] Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-11 Thread Eugene Miretsky

Hey Mich, Thanks for the detailed response. I get most of these options. However, what we are trying to do is avoid having to upload the source configs and pyspark.zip files to the cluster every time we execute the job using spark-submit. Here is the code that does it:

Re: [PySpark][Spark Dataframe][Observation] Why empty dataframe join doesn't let you get metrics from observation?

2023-12-11 Thread Михаил Кулаков

Hey Enrico it does help to understand it, thanks for explaining. Regarding this comment > PySpark and Scala should behave identically here Is it ok that Scala and PySpark optimization works differently in this case? вт, 5 дек. 2023 г. в 20:08, Enrico Minack : > Hi Michail, > > with

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-12-11 Thread Mich Talebzadeh

Hi Eugene, With regard to your points What are the PYTHONPATH and SPARK_HOME env variables in your script? OK let us look at a typical of my Spark project structure - project_root |-- README.md |-- __init__.py |-- conf | |-- (configuration files for Spark) |-- deployment | |--

Cluster-mode job compute-time/cost metrics

Unsubscribe

Unsubscribe

unsubscribe

Spark 3.1.3 with Hive dynamic partitions fails while driver moves the staged files

unsubscribe

unsubscribe

Re: [EXTERNAL] Re: [EXTERNAL] Re: Spark-submit without access to HDFS

Re: [PySpark][Spark Dataframe][Observation] Why empty dataframe join doesn't let you get metrics from observation?

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

10 matches

Site Navigation

Mail list logo

Footer information