+1 overall and a big +1 to keeping offline state-rebalancing as a primary use case.
Raghu. On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny <bartkoniec...@gmail.com> wrote: > Thank you, Jungtaek, for your answers! It's clear now. > > +1 for me. It seems like a prerequisite for further ops-related > improvements for the state store management. I mean especially here the > state rebalancing that could rely on this read+write state store API. I > don't mean here the dynamic state rebalancing that could probably be > implemented with a lower latency directly in the stateful API. Instead I'm > thinking more of an offline job to rebalance the state and later restart > the stateful pipeline with the changed number of shuffle partitions. > > Best, > Bartosz. > > On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> bump for better reach >> >> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Sorry, please use this link instead for SPIP doc: >>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing >>> >>> >>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>>> Hi dev, >>>> >>>> I'd like to start a discussion on "State Data Source - Reader". >>>> >>>> This proposal aims to introduce a new data source "statestore" which >>>> enables reading the state rows from existing checkpoint via offline (batch) >>>> query. This will enable users to 1) create unit tests against stateful >>>> query verifying the state value (especially flatMapGroupsWithState), 2) >>>> gather more context on the status when an incident occurs, especially for >>>> incorrect output. >>>> >>>> *SPIP*: >>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing >>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511 >>>> >>>> Looking forward to your feedback! >>>> >>>> Thanks, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>> ps. The scope of the project is narrowed to the reader in this SPIP, >>>> since the writer requires us to consider more cases. We are planning on it. >>>> >>> > > -- > Bartosz Konieczny > freelance data engineer > https://www.waitingforcode.com > https://github.com/bartosz25/ > https://twitter.com/waitingforcode > >