Re: Spark Structured Streaming - stderr getting filled up

2022-09-19 Thread karan alang
here is the stackoverflow link https://stackoverflow.com/questions/73780259/spark-structured-streaming-stderr-getting-filled-up On Mon, Sep 19, 2022 at 4:41 PM karan alang wrote: > I've created a stackoverflow ticket for this as well > > On Mon, Sep 19, 2022 at 4:37 PM karan alang wrote: > >>

Re: Spark Structured Streaming - stderr getting filled up

2022-09-19 Thread karan alang
I've created a stackoverflow ticket for this as well On Mon, Sep 19, 2022 at 4:37 PM karan alang wrote: > Hello All, > I've a Spark Structured Streaming job on GCP Dataproc - which picks up > data from Kafka, does processing and pushes data back into kafka topics. > > Couple of questions : > 1.

Spark Structured Streaming - stderr getting filled up

2022-09-19 Thread karan alang
Hello All, I've a Spark Structured Streaming job on GCP Dataproc - which picks up data from Kafka, does processing and pushes data back into kafka topics. Couple of questions : 1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ? What I notice is that stdout is empty, while all the

Re: Splittable or not?

2022-09-19 Thread Jack Goodson
When reading in Gzip files, I’ve always read them into a data frame and then written out to parquet/delta more or less in their raw form and then used these files for my transformations as the workloads are now parallelisable from these split files, when reading in Gzips these will be read by

Re: Re: [how to]RDD using JDBC data source in PySpark

2022-09-19 Thread javaca...@163.com
Thank you Bjorn Jorgensen and also thank to Sean Owen. DataFrame and .format("jdbc") is good way to resolved it. But in some reasons, i can't using DataFrame API, only can use RDD API in PySpark. ...T_T... thanks all you guys help. but still need new idea to resolve it. XD

Re: 答复: [how to]RDD using JDBC data source in PySpark

2022-09-19 Thread Sean Owen
Just use the .format('jdbc') data source? This is built in, for all languages. You can get an RDD out if you must. On Mon, Sep 19, 2022, 5:28 AM javaca...@163.com wrote: > Thank you answer alton. > > But i see that is use scala to implement it. > I know java/scala can get data from mysql using

Re: 答复: [how to]RDD using JDBC data source in PySpark

2022-09-19 Thread Bjørn Jørgensen
https://www.projectpro.io/recipes/save-dataframe-mysql-pyspark and https://towardsdatascience.com/pyspark-mysql-tutorial-fa3f7c26dc7 man. 19. sep. 2022 kl. 12:29 skrev javaca...@163.com : > Thank you answer alton. > > But i see that is use scala to implement it. > I know java/scala can get data

回复: 答复: [how to]RDD using JDBC data source in PySpark

2022-09-19 Thread javaca...@163.com
Thank you answer alton. But i see that is use scala to implement it. I know java/scala can get data from mysql using JDBCRDD farily well. But i want to get same way in Python Spark. Would you to give me more advice, very thanks to you. javaca...@163.com 发件人: Xiao, Alton 发送时间: 2022-09-19

答复: [how to]RDD using JDBC data source in PySpark

2022-09-19 Thread Xiao, Alton
Hi javacaoyu: https://hevodata.com/learn/spark-mysql/#Spark-MySQL-Integration I think spark have already integrated mysql 发件人: javaca...@163.com 日期: 星期一, 2022年9月19日 17:53 收件人: user@spark.apache.org 主题: [how to]RDD using JDBC data source in PySpark 你通常不会收到来自 javaca...@163.com

[how to]RDD using JDBC data source in PySpark

2022-09-19 Thread javaca...@163.com
Hi guys: Does have some way to let rdd can using jdbc data source in pyspark? i want to get data from mysql, but in PySpark, there is not supported JDBCRDD like java/scala. and i search docs from web site, no answer. So i need your guys help, Thank you very much.

Re: Splittable or not?

2022-09-19 Thread Sid
Cool. Thanks, everyone for the reply. On Sat, Sep 17, 2022 at 9:50 PM Enrico Minack wrote: > If with "won't affect the performance" you mean "parquet is splittable > though it uses snappy", then yes. Splittable files allow for optimal > parallelization, which "won't affect performance". > >

Driver throws exception every few hours

2022-09-19 Thread Kiran Biswal
Hello Experts Seeing below exceptions thrown by the spark driver every few hours. Using spark 3.3.0 com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392 Caused by: com.fasterxml.jackson.databind.JsonMappingException: timeout (through reference chain: