Creating Custom Broadcast Join

2022-09-01 Thread Murali S
Hi, I wanted to broadcast a Dataframe to all executors and do an operation similar to join, but might return a variable number of rows than the rows in each partition and could use multiple rows to produce one row. I am trying to create a custom join operator for this use case. It would be great

Re: Spark 3.3.0/3.2.2: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15

2022-09-01 Thread FengYu Cao
I will open a JIRA, but since it's our production event log, can't attach to it. try to setup a debugger to provider more information. Chao Sun 于2022年9月1日周四 23:06写道: > Hi Fengyu, > > Do you still have the Parquet file that caused the error? could you > open a JIRA and attach the file to it? I

Re: Spark 3.3.0/3.2.2: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15

2022-09-01 Thread Chao Sun
Hi Fengyu, Do you still have the Parquet file that caused the error? could you open a JIRA and attach the file to it? I can take a look. Chao On Thu, Sep 1, 2022 at 4:03 AM FengYu Cao wrote: > > I'm trying to upgrade our spark (3.2.1 now) > > but with spark 3.3.0 and spark 3.2.2, we had error

Re: [Structured Streaming + Kafka] Reduced support for alternative offset management

2022-09-01 Thread Jungtaek Lim
Please consider DStream as old school technology and migrate to Structured Streaming. There is little effort on DStream, and the most focused one is Spark SQL, and for streaming workloads, Structured Streaming. For Kafka integration, the guide doc is here,

Spark 3.3.0/3.2.2: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15

2022-09-01 Thread FengYu Cao
I'm trying to upgrade our spark (3.2.1 now) but with spark 3.3.0 and spark 3.2.2, we had error with specific parquet file Is anyone else having the same problem as me? Or do I need to provide any information to the devs ? ``` org.apache.spark.SparkException: Job aborted due to stage failure: