Executor tab missing information

2023-02-13 Thread Prem Sahoo
Hello All, I am executing spark jobs but in executor tab I am missing information, I cant see any data/info coming up. Please let me know what I am missing .

Re: Executor metrics are missing on Prometheus sink

2023-02-13 Thread Qian Sun
Hi Luca, Thanks for your reply, which is very helpful for me :) I am trying other metrics sinks with cAdvisor to see the effect. If it works well, I will share it with the community. On Fri, Feb 10, 2023 at 4:26 PM Luca Canali wrote: > Hi Qian, > > > > Indeed the metrics available with the

Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-13 Thread karan alang
Hello All, I'm trying to run a simple application on GKE (Kubernetes), and it is failing: Note : I have spark(bitnami spark chart) installed on GKE using helm install Here is what is done : 1. created a docker image using Dockerfile Dockerfile : ``` FROM python:3.7-slim RUN apt-get update &&

[Spark Core] Spark data loss/data duplication when executors die

2023-02-13 Thread Erik Eklund
Hi, We are facing this issue when we convert RDD -> Dataset followed by repartition + write. We are using spot instances on k8s which means they can die at any moment. And when they do during this phase, we very often see data duplication happening. Pseudo job code: val rdd = data.map(…) val

Re: How to improve efficiency of this piece of code (returning distinct column values)

2023-02-13 Thread sam smith
Alright, this is the working Java version of it: List listCols = new ArrayList(); > Arrays.asList(dataset.columns()).forEach(column -> { > listCols.add(org.apache.spark.sql.functions.collect_set(column)); }); > Column[] arrCols = listCols.toArray(new Column[listCols.size()]); > dataset =