Re: JavaSerializerInstance is slow

2021-09-02 Thread Antonin Delpeuch (lists)
Hi Kohki, Serialization of tasks happens in local mode too and as far as I am aware there is no way to disable this (although it would definitely be useful in my opinion). You can see the local mode as a testing mode, in which you would want to catch any serialization errors, before they appear

type mismatch

2021-09-02 Thread igyu
val schemas = createSchemas(config) val arr = new Array[String](schemas.size()) lines.map(x => { val obj = JSON.parseObject(x) val vs = new Array[Any](schemas.size()) for (i <- 0 until schemas.size()) { arr(i) = schemas.get(i).name vs(i) = x.getString(schemas.get(i).name) } }

JavaSerializerInstance is slow

2021-09-02 Thread Kohki Nishio
I'm seeing many threads doing deserialization of a task, I understand since lambda is involved, we can't use Kryo for those purposes. However I'm running it in local mode, this serialization is not really necessary, no? Is there any trick I can apply to get rid of this thread contention ? I'm

Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread Jungtaek Lim
Hi, The file stream sink maintains the metadata in the output directory. The metadata retains the list of files written by the streaming query, and Spark reads the metadata on listing the files to read. This is to guarantee end-to-end exactly once on writing files in the streaming query. There

Unsubscribe

2021-09-02 Thread 周翔
Unsubscribe

Re: Can’t write to PVC in K8S

2021-09-02 Thread Bjørn Jørgensen
Well, I have tried almost everything the last 2 days now. There is no user spark, and whatever I do with the executor image it only runs for 2 minutes in k8s and then restarts. The problem seems to be the nogroup that is writing files from executors. drwxr-xr-x 2185 nogroup4096 Sep

Re: Get application metric from Spark job

2021-09-02 Thread Haryani, Akshay
Hi Aurélien, Spark has endpoints to expose the spark application metrics. These endpoints can be used as a rest API. You can read more about these here: https://spark.apache.org/docs/3.1.1/monitoring.html#rest-api Additionally, If you want to build your own custom metrics, you can explore

Reading CSV and Transforming to Parquet Issue

2021-09-02 Thread ☼ R Nair
All, This is very surprising and I am sure I might be doing something wrong. The issue is, the following code is taking 8 hours. It reads a CSV file, takes the phone number column, extracts the first four digits and then partitions based on the four digits (phoneseries) and writes to Parquet. Any

Get application metric from Spark job

2021-09-02 Thread Aurélien Mazoyer
Hi community, I would like to collect information about the execution of a Spark job while it is running. Could I define some kind of application metrics (such as a counter that would be incremented in my code) that I could retrieve regularly while the job is running? Thank you for help,

Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread eugen . wintersberger
Hi all,   I recently stumbled about a rather strange  problem with streaming sources in one of my tests. I am writing a Parquet file from a streaming source and subsequently try to append the same data but this time from a static dataframe. Surprisingly, the number of rows in the Parquet file

Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma
On 2021/09/02 06:00:26, Harsh Sharma wrote: > Please Find reply : > Do you know when in your application lifecycle it happens? Spark SQL or > > Structured Streaming? > > ans :its Spark SQL > > Do you use broadcast variables ? > > ans : yes we are using broadcast variables > or are the

Unsubscribe

2021-09-02 Thread 孙乾(亨贞)
Unsubscribe

Spark Phoenix Connection Exception while loading from Phoenix tables

2021-09-02 Thread Harsh Sharma
[01/09/21 11:55:51,861 WARN pool-1-thread-1](Client) Exception encountered while connecting to the server : java.lang.NullPointerException [01/09/21 11:55:51,862 WARN pool-1-thread-1](Client) Exception encountered while connecting to the server : java.lang.NullPointerException [01/09/21

Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma
Please Find reply : Do you know when in your application lifecycle it happens? Spark SQL or > Structured Streaming? ans :its Spark SQL Do you use broadcast variables ? ans : yes we are using broadcast variables or are the errors coming from broadcast joins perhaps? ans :we are not using