Hi,
I wanted to broadcast a Dataframe to all executors and do an operation
similar to join, but might return a variable number of rows than the rows
in each partition and could use multiple rows to produce one row.
I am trying to create a custom join operator for this use case. It would be
great
I will open a JIRA, but since it's our production event log, can't attach
to it.
try to setup a debugger to provider more information.
Chao Sun 于2022年9月1日周四 23:06写道:
> Hi Fengyu,
>
> Do you still have the Parquet file that caused the error? could you
> open a JIRA and attach the file to it? I
Hi Fengyu,
Do you still have the Parquet file that caused the error? could you
open a JIRA and attach the file to it? I can take a look.
Chao
On Thu, Sep 1, 2022 at 4:03 AM FengYu Cao wrote:
>
> I'm trying to upgrade our spark (3.2.1 now)
>
> but with spark 3.3.0 and spark 3.2.2, we had error
Please consider DStream as old school technology and migrate to Structured
Streaming. There is little effort on DStream, and the most focused one is
Spark SQL, and for streaming workloads, Structured Streaming.
For Kafka integration, the guide doc is here,
I'm trying to upgrade our spark (3.2.1 now)
but with spark 3.3.0 and spark 3.2.2, we had error with specific parquet
file
Is anyone else having the same problem as me? Or do I need to provide any
information to the devs ?
```
org.apache.spark.SparkException: Job aborted due to stage failure: