gt;
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Fri, Oct 27, 2023 at 5:22 AM Andrzej Zera
> wrote:
>
>> Hey All,
>>
>> I'm trying to reproduce the following streaming operation: "Time window
>> aggregation in separate streams followed by stream-stream jo
Hey All,
I'm trying to reproduce the following streaming operation: "Time window
aggregation in separate streams followed by stream-stream join". According
to documentation, this should be possible in Spark 3.5.0 but I had no
success despite different tries.
Here is a documentation snippet I'm
helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Thu, Jan 11, 2024 at 6:13 AM Andrzej Zera
> wrote:
>
>> I'm struggling with the following issue in Spark >=3.4, related to
>> multiple stateful operations.
>>
>> When spark.sql.strea
I'm struggling with the following issue in Spark >=3.4, related to multiple
stateful operations.
When spark.sql.streaming.statefulOperator.allowMultiple is enabled, Spark
keeps track of two types of watermarks: eventTimeWatermarkForEviction and
eventTimeWatermarkForLateEvents. Introducing them
https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
s://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The a
pBy(...).count()
>>> intermediate_df.cache()
>>> # Use cached intermediate_df for further transformations or actions
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>
arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 6 Jan 2024 at 08:19, Andrzej Zera wrote:
>
>> Hey,
>>
>> I'm running a few Structured Streaming jobs (with Spark 3.5.0) that
>> require near-real time accuracy with trigger inter
Hey,
I'm running a few Structured Streaming jobs (with Spark 3.5.0) that require
near-real time accuracy with trigger intervals in the level of 5-10
seconds. I usually run 3-6 streaming queries as part of the job and each
query includes at least one stateful operation (and usually two or more).
Hi,
Do you think there is any chance for this issue to get resolved? Should I
create another bug report? As mentioned in my message, there is one open
already: https://issues.apache.org/jira/browse/SPARK-45637 but it covers
only one of the problems.
Andrzej
wt., 27 lut 2024 o 09:58 Andrzej Zera
anteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> O
Hey all,
I've been using Structured Streaming in production for almost a year
already and I want to share the bugs I found in this time. I created a test
for each of the issues and put them all here:
https://github.com/andrzejzera/spark-bugs/tree/main/spark-3.5/src/test/scala
I split the issues
Hey, do you perform stateful operations? Maybe your state is growing
indefinitely - a screenshot with state metrics would help (you can find it
in Spark UI -> Structured Streaming -> your query). Do you have a
driver-only cluster or do you have workers too? What's the memory usage
profile at
13 matches
Mail list logo