Re: Spark-Connect: Param `--packages` does not take effect for executors.

2023-12-04 Thread Holden Karau
So I think this sounds like a bug to me, in the help options for both regular spark-submit and ./sbin/start-connect-server.sh we say: " --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths.

ML advice

2023-12-04 Thread Zahid Rahman
Hi, I heard some big things about Machine learning and data science. To upgrade my skill set I took a udemy course Python and Spark for Big Data with Spark. It took about week to learn the concepts and the workflow to follow when using each of the Spark APIs. To complete a Machine Learning

Re: Do we have any mechanism to control requests per second for a Kafka connect sink?

2023-12-04 Thread Yeikel Santana
Apologies to everyone. I sent this to the wrong email list. Please discard On Mon, 04 Dec 2023 10:48:11 -0500 Yeikel Santana wrote --- Hello everyone, Is there any mechanism to force Kafka Connect to ingest at a given rate per second as opposed to tasks? I am operating in a

Do we have any mechanism to control requests per second for a Kafka connect sink?

2023-12-04 Thread Yeikel Santana
Hello everyone, Is there any mechanism to force Kafka Connect to ingest at a given rate per second as opposed to tasks? I am operating in a shared environment where the ingestion rate needs to be as low as possible (for example, 5 requests/second as an upper limit), and as far as I can

Re: Spark-Connect: Param `--packages` does not take effect for executors.

2023-12-04 Thread Aironman DirtDiver
The issue you're encountering with the iceberg-spark-runtime dependency not being properly passed to the executors in your Spark Connect server deployment could be due to a couple of factors: 1. *Spark Submit Packaging:* When you use the --packages parameter in spark-submit, it only

Spark-Connect: Param `--packages` does not take effect for executors.

2023-12-04 Thread Xiaolong Wang
Hi, Spark community, I encountered a weird bug when using Spark Connect server to integrate with Iceberg. I added the iceberg-spark-runtime dependency with `--packages` param, the driver/connect-server pod did get the correct dependencies. But when looking at the executor's library, the jar was

Re: [PySpark][Spark Dataframe][Observation] Why empty dataframe join doesn't let you get metrics from observation?

2023-12-04 Thread Enrico Minack
Hi Michail, observations as well as ordinary accumulators only observe / process rows that are iterated / consumed by downstream stages. If the query plan decides to skip one side of the join, that one will be removed from the final plan completely. Then, the Observation will not retrieve any