Cluster-mode job compute-time/cost metrics

2023-12-11 Thread Jack Wells
Hello Spark experts - I’m running Spark jobs in cluster mode using a dedicated cluster for each job. Is there a way to see how much compute time each job takes via Spark APIs, metrics, etc.? In case it makes a difference, I’m using AWS EMR - I’d ultimately like to be able to say this job costs $X

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells
.repartition(6) > .cache() > ) > > On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote: > >> Hi Nebi, can you share the code you’re using to read and write from S3? >> >> On Sep 8, 2023 at 10:59:59, Nebi Aydin >> wrote: >> >>> Hi all, &

Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells
Hi Nebi, can you share the code you’re using to read and write from S3? On Sep 8, 2023 at 10:59:59, Nebi Aydin wrote: > Hi all, > I am using spark on EMR to process data. Basically i read data from AWS S3 > and do the transformation and post transformation i am loading/writing data > to s3. >

Re: [Spark SQL] Data objects from query history

2023-07-03 Thread Jack Wells
Hi Ruben, I’m not sure if this answers your question, but if you’re interested in exploring the underlying tables, you could always try something like the below in a Databricks notebook: display(spark.read.table(’samples.nyctaxi.trips’)) (For vanilla Spark users, it would be