Re: queryable state & streaming

2019-04-24 Thread Stavros Kontopoulos
Michael,
I have listed used cases above should we proceed with a design doc?

Best,
Stavros

Στις Δευ, 18 Μαρ 2019, 12:21 μ.μ. ο χρήστης Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> έγραψε:

> Not really, if we agree that we want this, I can put together a design
> document and take it from there. There was also a discussion in another
> thread about adding RockDB as a memory storage that is related to this task.
>
> Best,
> Stavros
>
> On Sun, Mar 17, 2019 at 4:42 AM kant kodali  wrote:
>
>> Any update on this?
>>
>> On Wed, Oct 24, 2018 at 4:26 PM Arun Mahadevan  wrote:
>>
>>> I don't think separate API or RPCs etc might be necessary for queryable
>>> state if the state can be exposed as just another datasource. Then the sql
>>> queries can be issued against it just like executing sql queries against
>>> any other data source.
>>>
>>> For now I think the "memory" sink could be used  as a sink and run
>>> queries against it but I agree it does not scale for large states.
>>>
>>> On Sun, 21 Oct 2018 at 21:24, Jungtaek Lim  wrote:
>>>
 It doesn't seem Spark has workarounds other than storing output into
 external storages, so +1 on having this.

 My major concern on implementing queryable state in structured
 streaming is "Are all states available on executors at any time while query
 is running?" Querying state shouldn't affect the running query. Given that
 state is huge and default state provider is loading state in memory, we may
 not want to load one more redundant snapshot of state: we want to always
 load "current state" which query is also using. (For sure, Queryable state
 should be read-only.)

 Regarding improvement of local state, I guess it is ideal to leverage
 embedded db, like Kafka and Flink are doing. The difference will not be
 only reading state from non-heap, but also how to take a snapshot and store
 delta. We may want to check snapshotting works well with small batch
 interval, and find alternative approach when it doesn't. Sounds like it is
 a huge item and can be handled individually.

 - Jungtaek Lim (HeartSaVioR)

 2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos <
 st.kontopou...@gmail.com>님이 작성:

> Nice I was looking for a jira. So I agree we should justify why we are
> building something. Now to that direction here is what I have seen from my
> experience.
> People quite often use state within their streaming app and may have
> large states (TBs). Shortening the pipeline by not having to copy data (to
> Cassandra for example for serving) is an advantage, in terms of at least
> latency and complexity.
> This can be true if we advantage of state checkpointing (locally could
> be RocksDB or in general HDFS the latter is currently supported)  along
> with an API to efficiently query data.
> Some use cases I see:
>
> - real-time dashboards and real-time reporting, the faster the better
> - monitoring of state for operational reasons, app health etc...
> - integrating with external services via an API eg. making accessible
>  aggregations over time windows to some third party service within your
> system
>
> Regarding requirements here are some of them:
> - support of an API to expose state (could be done at the spark
> driver), like rest.
> - supporting dynamic allocation (not sure how it affects state
> management)
> - an efficient way to talk to executors to get the state (rpc?)
> - making local state more efficient and easier accessible with an
> embedded db (I dont think this is supported from what I see, maybe wrong)?
> Some people are already working with such techs and some stuff could
> be re-used: https://issues.apache.org/jira/browse/SPARK-20641
>
> Best,
> Stavros
>
>
> On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-16738
>>
>> I don't believe anyone is working on it yet.  I think the most useful
>> thing is to start enumerating requirements and use cases and then we can
>> talk about how to build it.
>>
>> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Cool Burak do you have a pointer, should I take the initiative for a
>>> first design document or Databricks is working on it?
>>>
>>> Best,
>>> Stavros
>>>
>>> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz 
>>> wrote:
>>>
 Hi Stavros,

 Queryable state is definitely on the roadmap! We will revamp the
 StateStore API a bit, and a queryable StateStore is definitely one of 
 the
 things we are thinking about during that revamp.

 Best,
 Burak

 On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" <
>

Re: queryable state & streaming

2019-03-18 Thread Stavros Kontopoulos
Not really, if we agree that we want this, I can put together a design
document and take it from there. There was also a discussion in another
thread about adding RockDB as a memory storage that is related to this task.

Best,
Stavros

On Sun, Mar 17, 2019 at 4:42 AM kant kodali  wrote:

> Any update on this?
>
> On Wed, Oct 24, 2018 at 4:26 PM Arun Mahadevan  wrote:
>
>> I don't think separate API or RPCs etc might be necessary for queryable
>> state if the state can be exposed as just another datasource. Then the sql
>> queries can be issued against it just like executing sql queries against
>> any other data source.
>>
>> For now I think the "memory" sink could be used  as a sink and run
>> queries against it but I agree it does not scale for large states.
>>
>> On Sun, 21 Oct 2018 at 21:24, Jungtaek Lim  wrote:
>>
>>> It doesn't seem Spark has workarounds other than storing output into
>>> external storages, so +1 on having this.
>>>
>>> My major concern on implementing queryable state in structured streaming
>>> is "Are all states available on executors at any time while query is
>>> running?" Querying state shouldn't affect the running query. Given that
>>> state is huge and default state provider is loading state in memory, we may
>>> not want to load one more redundant snapshot of state: we want to always
>>> load "current state" which query is also using. (For sure, Queryable state
>>> should be read-only.)
>>>
>>> Regarding improvement of local state, I guess it is ideal to leverage
>>> embedded db, like Kafka and Flink are doing. The difference will not be
>>> only reading state from non-heap, but also how to take a snapshot and store
>>> delta. We may want to check snapshotting works well with small batch
>>> interval, and find alternative approach when it doesn't. Sounds like it is
>>> a huge item and can be handled individually.
>>>
>>> - Jungtaek Lim (HeartSaVioR)
>>>
>>> 2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
>>> 작성:
>>>
 Nice I was looking for a jira. So I agree we should justify why we are
 building something. Now to that direction here is what I have seen from my
 experience.
 People quite often use state within their streaming app and may have
 large states (TBs). Shortening the pipeline by not having to copy data (to
 Cassandra for example for serving) is an advantage, in terms of at least
 latency and complexity.
 This can be true if we advantage of state checkpointing (locally could
 be RocksDB or in general HDFS the latter is currently supported)  along
 with an API to efficiently query data.
 Some use cases I see:

 - real-time dashboards and real-time reporting, the faster the better
 - monitoring of state for operational reasons, app health etc...
 - integrating with external services via an API eg. making accessible
  aggregations over time windows to some third party service within your
 system

 Regarding requirements here are some of them:
 - support of an API to expose state (could be done at the spark
 driver), like rest.
 - supporting dynamic allocation (not sure how it affects state
 management)
 - an efficient way to talk to executors to get the state (rpc?)
 - making local state more efficient and easier accessible with an
 embedded db (I dont think this is supported from what I see, maybe wrong)?
 Some people are already working with such techs and some stuff could be
 re-used: https://issues.apache.org/jira/browse/SPARK-20641

 Best,
 Stavros


 On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> https://issues.apache.org/jira/browse/SPARK-16738
>
> I don't believe anyone is working on it yet.  I think the most useful
> thing is to start enumerating requirements and use cases and then we can
> talk about how to build it.
>
> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
> st.kontopou...@gmail.com> wrote:
>
>> Cool Burak do you have a pointer, should I take the initiative for a
>> first design document or Databricks is working on it?
>>
>> Best,
>> Stavros
>>
>> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>>
>>> Hi Stavros,
>>>
>>> Queryable state is definitely on the roadmap! We will revamp the
>>> StateStore API a bit, and a queryable StateStore is definitely one of 
>>> the
>>> things we are thinking about during that revamp.
>>>
>>> Best,
>>> Burak
>>>
>>> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" <
>>> st.kontopou...@gmail.com> wrote:
>>>
 Just to re-phrase my question: Would query-able state make a
 viable SPIP?

 Regards,
 Stavros

 On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
 st.kontopou...@gmail.com> wrote:

> Hi,
>
> Maybe this ha

Re: queryable state & streaming

2019-03-16 Thread kant kodali
Any update on this?

On Wed, Oct 24, 2018 at 4:26 PM Arun Mahadevan  wrote:

> I don't think separate API or RPCs etc might be necessary for queryable
> state if the state can be exposed as just another datasource. Then the sql
> queries can be issued against it just like executing sql queries against
> any other data source.
>
> For now I think the "memory" sink could be used  as a sink and run queries
> against it but I agree it does not scale for large states.
>
> On Sun, 21 Oct 2018 at 21:24, Jungtaek Lim  wrote:
>
>> It doesn't seem Spark has workarounds other than storing output into
>> external storages, so +1 on having this.
>>
>> My major concern on implementing queryable state in structured streaming
>> is "Are all states available on executors at any time while query is
>> running?" Querying state shouldn't affect the running query. Given that
>> state is huge and default state provider is loading state in memory, we may
>> not want to load one more redundant snapshot of state: we want to always
>> load "current state" which query is also using. (For sure, Queryable state
>> should be read-only.)
>>
>> Regarding improvement of local state, I guess it is ideal to leverage
>> embedded db, like Kafka and Flink are doing. The difference will not be
>> only reading state from non-heap, but also how to take a snapshot and store
>> delta. We may want to check snapshotting works well with small batch
>> interval, and find alternative approach when it doesn't. Sounds like it is
>> a huge item and can be handled individually.
>>
>> - Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
>> 작성:
>>
>>> Nice I was looking for a jira. So I agree we should justify why we are
>>> building something. Now to that direction here is what I have seen from my
>>> experience.
>>> People quite often use state within their streaming app and may have
>>> large states (TBs). Shortening the pipeline by not having to copy data (to
>>> Cassandra for example for serving) is an advantage, in terms of at least
>>> latency and complexity.
>>> This can be true if we advantage of state checkpointing (locally could
>>> be RocksDB or in general HDFS the latter is currently supported)  along
>>> with an API to efficiently query data.
>>> Some use cases I see:
>>>
>>> - real-time dashboards and real-time reporting, the faster the better
>>> - monitoring of state for operational reasons, app health etc...
>>> - integrating with external services via an API eg. making accessible
>>>  aggregations over time windows to some third party service within your
>>> system
>>>
>>> Regarding requirements here are some of them:
>>> - support of an API to expose state (could be done at the spark driver),
>>> like rest.
>>> - supporting dynamic allocation (not sure how it affects state
>>> management)
>>> - an efficient way to talk to executors to get the state (rpc?)
>>> - making local state more efficient and easier accessible with an
>>> embedded db (I dont think this is supported from what I see, maybe wrong)?
>>> Some people are already working with such techs and some stuff could be
>>> re-used: https://issues.apache.org/jira/browse/SPARK-20641
>>>
>>> Best,
>>> Stavros
>>>
>>>
>>> On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 https://issues.apache.org/jira/browse/SPARK-16738

 I don't believe anyone is working on it yet.  I think the most useful
 thing is to start enumerating requirements and use cases and then we can
 talk about how to build it.

 On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
 st.kontopou...@gmail.com> wrote:

> Cool Burak do you have a pointer, should I take the initiative for a
> first design document or Databricks is working on it?
>
> Best,
> Stavros
>
> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>
>> Hi Stavros,
>>
>> Queryable state is definitely on the roadmap! We will revamp the
>> StateStore API a bit, and a queryable StateStore is definitely one of the
>> things we are thinking about during that revamp.
>>
>> Best,
>> Burak
>>
>> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Just to re-phrase my question: Would query-able state make a viable
>>> SPIP?
>>>
>>> Regards,
>>> Stavros
>>>
>>> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
>>> st.kontopou...@gmail.com> wrote:
>>>
 Hi,

 Maybe this has been discussed before. Given the fact that many
 streaming apps out there use state extensively, could be a good idea to
 make Spark expose streaming state with an external API like other
 systems do (Kafka streams, Flink etc), in order to facilitate
 interactive queries?

 Regards,
 Stavros

>>>
>>>
>

>>>


Re: queryable state & streaming

2018-10-24 Thread Arun Mahadevan
I don't think separate API or RPCs etc might be necessary for queryable
state if the state can be exposed as just another datasource. Then the sql
queries can be issued against it just like executing sql queries against
any other data source.

For now I think the "memory" sink could be used  as a sink and run queries
against it but I agree it does not scale for large states.

On Sun, 21 Oct 2018 at 21:24, Jungtaek Lim  wrote:

> It doesn't seem Spark has workarounds other than storing output into
> external storages, so +1 on having this.
>
> My major concern on implementing queryable state in structured streaming
> is "Are all states available on executors at any time while query is
> running?" Querying state shouldn't affect the running query. Given that
> state is huge and default state provider is loading state in memory, we may
> not want to load one more redundant snapshot of state: we want to always
> load "current state" which query is also using. (For sure, Queryable state
> should be read-only.)
>
> Regarding improvement of local state, I guess it is ideal to leverage
> embedded db, like Kafka and Flink are doing. The difference will not be
> only reading state from non-heap, but also how to take a snapshot and store
> delta. We may want to check snapshotting works well with small batch
> interval, and find alternative approach when it doesn't. Sounds like it is
> a huge item and can be handled individually.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
> 작성:
>
>> Nice I was looking for a jira. So I agree we should justify why we are
>> building something. Now to that direction here is what I have seen from my
>> experience.
>> People quite often use state within their streaming app and may have
>> large states (TBs). Shortening the pipeline by not having to copy data (to
>> Cassandra for example for serving) is an advantage, in terms of at least
>> latency and complexity.
>> This can be true if we advantage of state checkpointing (locally could be
>> RocksDB or in general HDFS the latter is currently supported)  along with
>> an API to efficiently query data.
>> Some use cases I see:
>>
>> - real-time dashboards and real-time reporting, the faster the better
>> - monitoring of state for operational reasons, app health etc...
>> - integrating with external services via an API eg. making accessible
>>  aggregations over time windows to some third party service within your
>> system
>>
>> Regarding requirements here are some of them:
>> - support of an API to expose state (could be done at the spark driver),
>> like rest.
>> - supporting dynamic allocation (not sure how it affects state
>> management)
>> - an efficient way to talk to executors to get the state (rpc?)
>> - making local state more efficient and easier accessible with an
>> embedded db (I dont think this is supported from what I see, maybe wrong)?
>> Some people are already working with such techs and some stuff could be
>> re-used: https://issues.apache.org/jira/browse/SPARK-20641
>>
>> Best,
>> Stavros
>>
>>
>> On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust > > wrote:
>>
>>> https://issues.apache.org/jira/browse/SPARK-16738
>>>
>>> I don't believe anyone is working on it yet.  I think the most useful
>>> thing is to start enumerating requirements and use cases and then we can
>>> talk about how to build it.
>>>
>>> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
>>> st.kontopou...@gmail.com> wrote:
>>>
 Cool Burak do you have a pointer, should I take the initiative for a
 first design document or Databricks is working on it?

 Best,
 Stavros

 On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:

> Hi Stavros,
>
> Queryable state is definitely on the roadmap! We will revamp the
> StateStore API a bit, and a queryable StateStore is definitely one of the
> things we are thinking about during that revamp.
>
> Best,
> Burak
>
> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" <
> st.kontopou...@gmail.com> wrote:
>
>> Just to re-phrase my question: Would query-able state make a viable
>> SPIP?
>>
>> Regards,
>> Stavros
>>
>> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Maybe this has been discussed before. Given the fact that many
>>> streaming apps out there use state extensively, could be a good idea to
>>> make Spark expose streaming state with an external API like other
>>> systems do (Kafka streams, Flink etc), in order to facilitate
>>> interactive queries?
>>>
>>> Regards,
>>> Stavros
>>>
>>
>>

>>>
>>


Re: queryable state & streaming

2018-10-21 Thread Jungtaek Lim
It doesn't seem Spark has workarounds other than storing output into
external storages, so +1 on having this.

My major concern on implementing queryable state in structured streaming is
"Are all states available on executors at any time while query is running?"
Querying state shouldn't affect the running query. Given that state is huge
and default state provider is loading state in memory, we may not want to
load one more redundant snapshot of state: we want to always load "current
state" which query is also using. (For sure, Queryable state should be
read-only.)

Regarding improvement of local state, I guess it is ideal to leverage
embedded db, like Kafka and Flink are doing. The difference will not be
only reading state from non-heap, but also how to take a snapshot and store
delta. We may want to check snapshotting works well with small batch
interval, and find alternative approach when it doesn't. Sounds like it is
a huge item and can be handled individually.

- Jungtaek Lim (HeartSaVioR)

2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
작성:

> Nice I was looking for a jira. So I agree we should justify why we are
> building something. Now to that direction here is what I have seen from my
> experience.
> People quite often use state within their streaming app and may have large
> states (TBs). Shortening the pipeline by not having to copy data (to
> Cassandra for example for serving) is an advantage, in terms of at least
> latency and complexity.
> This can be true if we advantage of state checkpointing (locally could be
> RocksDB or in general HDFS the latter is currently supported)  along with
> an API to efficiently query data.
> Some use cases I see:
>
> - real-time dashboards and real-time reporting, the faster the better
> - monitoring of state for operational reasons, app health etc...
> - integrating with external services via an API eg. making accessible
>  aggregations over time windows to some third party service within your
> system
>
> Regarding requirements here are some of them:
> - support of an API to expose state (could be done at the spark driver),
> like rest.
> - supporting dynamic allocation (not sure how it affects state management)
> - an efficient way to talk to executors to get the state (rpc?)
> - making local state more efficient and easier accessible with an embedded
> db (I dont think this is supported from what I see, maybe wrong)?
> Some people are already working with such techs and some stuff could be
> re-used: https://issues.apache.org/jira/browse/SPARK-20641
>
> Best,
> Stavros
>
>
> On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust 
> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-16738
>>
>> I don't believe anyone is working on it yet.  I think the most useful
>> thing is to start enumerating requirements and use cases and then we can
>> talk about how to build it.
>>
>> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Cool Burak do you have a pointer, should I take the initiative for a
>>> first design document or Databricks is working on it?
>>>
>>> Best,
>>> Stavros
>>>
>>> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>>>
 Hi Stavros,

 Queryable state is definitely on the roadmap! We will revamp the
 StateStore API a bit, and a queryable StateStore is definitely one of the
 things we are thinking about during that revamp.

 Best,
 Burak

 On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
 wrote:

> Just to re-phrase my question: Would query-able state make a viable
> SPIP?
>
> Regards,
> Stavros
>
> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
> st.kontopou...@gmail.com> wrote:
>
>> Hi,
>>
>> Maybe this has been discussed before. Given the fact that many
>> streaming apps out there use state extensively, could be a good idea to
>> make Spark expose streaming state with an external API like other
>> systems do (Kafka streams, Flink etc), in order to facilitate
>> interactive queries?
>>
>> Regards,
>> Stavros
>>
>
>
>>>
>>
>


Re: queryable state & streaming

2017-12-09 Thread Stavros Kontopoulos
Nice I was looking for a jira. So I agree we should justify why we are
building something. Now to that direction here is what I have seen from my
experience.
People quite often use state within their streaming app and may have large
states (TBs). Shortening the pipeline by not having to copy data (to
Cassandra for example for serving) is an advantage, in terms of at least
latency and complexity.
This can be true if we advantage of state checkpointing (locally could be
RocksDB or in general HDFS the latter is currently supported)  along with
an API to efficiently query data.
Some use cases I see:

- real-time dashboards and real-time reporting, the faster the better
- monitoring of state for operational reasons, app health etc...
- integrating with external services via an API eg. making accessible
 aggregations over time windows to some third party service within your
system

Regarding requirements here are some of them:
- support of an API to expose state (could be done at the spark driver),
like rest.
- supporting dynamic allocation (not sure how it affects state management)
- an efficient way to talk to executors to get the state (rpc?)
- making local state more efficient and easier accessible with an embedded
db (I dont think this is supported from what I see, maybe wrong)?
Some people are already working with such techs and some stuff could be
re-used: https://issues.apache.org/jira/browse/SPARK-20641

Best,
Stavros


On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust 
wrote:

> https://issues.apache.org/jira/browse/SPARK-16738
>
> I don't believe anyone is working on it yet.  I think the most useful
> thing is to start enumerating requirements and use cases and then we can
> talk about how to build it.
>
> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
> st.kontopou...@gmail.com> wrote:
>
>> Cool Burak do you have a pointer, should I take the initiative for a
>> first design document or Databricks is working on it?
>>
>> Best,
>> Stavros
>>
>> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>>
>>> Hi Stavros,
>>>
>>> Queryable state is definitely on the roadmap! We will revamp the
>>> StateStore API a bit, and a queryable StateStore is definitely one of the
>>> things we are thinking about during that revamp.
>>>
>>> Best,
>>> Burak
>>>
>>> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
>>> wrote:
>>>
 Just to re-phrase my question: Would query-able state make a viable
 SPIP?

 Regards,
 Stavros

 On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
 st.kontopou...@gmail.com> wrote:

> Hi,
>
> Maybe this has been discussed before. Given the fact that many
> streaming apps out there use state extensively, could be a good idea to
> make Spark expose streaming state with an external API like other
> systems do (Kafka streams, Flink etc), in order to facilitate
> interactive queries?
>
> Regards,
> Stavros
>


>>
>


Re: queryable state & streaming

2017-12-08 Thread Michael Armbrust
https://issues.apache.org/jira/browse/SPARK-16738

I don't believe anyone is working on it yet.  I think the most useful thing
is to start enumerating requirements and use cases and then we can talk
about how to build it.

On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
st.kontopou...@gmail.com> wrote:

> Cool Burak do you have a pointer, should I take the initiative for a first
> design document or Databricks is working on it?
>
> Best,
> Stavros
>
> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>
>> Hi Stavros,
>>
>> Queryable state is definitely on the roadmap! We will revamp the
>> StateStore API a bit, and a queryable StateStore is definitely one of the
>> things we are thinking about during that revamp.
>>
>> Best,
>> Burak
>>
>> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
>> wrote:
>>
>>> Just to re-phrase my question: Would query-able state make a viable
>>> SPIP?
>>>
>>> Regards,
>>> Stavros
>>>
>>> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
>>> st.kontopou...@gmail.com> wrote:
>>>
 Hi,

 Maybe this has been discussed before. Given the fact that many
 streaming apps out there use state extensively, could be a good idea to
 make Spark expose streaming state with an external API like other
 systems do (Kafka streams, Flink etc), in order to facilitate
 interactive queries?

 Regards,
 Stavros

>>>
>>>
>


Re: queryable state & streaming

2017-12-08 Thread Stavros Kontopoulos
Cool Burak do you have a pointer, should I take the initiative for a first
design document or Databricks is working on it?

Best,
Stavros

On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:

> Hi Stavros,
>
> Queryable state is definitely on the roadmap! We will revamp the
> StateStore API a bit, and a queryable StateStore is definitely one of the
> things we are thinking about during that revamp.
>
> Best,
> Burak
>
> On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
> wrote:
>
>> Just to re-phrase my question: Would query-able state make a viable SPIP?
>>
>> Regards,
>> Stavros
>>
>> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Maybe this has been discussed before. Given the fact that many streaming
>>> apps out there use state extensively, could be a good idea to make Spark
>>> expose streaming state with an external API like other systems do (Kafka
>>> streams, Flink etc), in order to facilitate interactive queries?
>>>
>>> Regards,
>>> Stavros
>>>
>>
>>


Re: queryable state & streaming

2017-12-08 Thread Burak Yavuz
Hi Stavros,

Queryable state is definitely on the roadmap! We will revamp the StateStore
API a bit, and a queryable StateStore is definitely one of the things we
are thinking about during that revamp.

Best,
Burak

On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
wrote:

> Just to re-phrase my question: Would query-able state make a viable SPIP?
>
> Regards,
> Stavros
>
> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
> st.kontopou...@gmail.com> wrote:
>
>> Hi,
>>
>> Maybe this has been discussed before. Given the fact that many streaming
>> apps out there use state extensively, could be a good idea to make Spark
>> expose streaming state with an external API like other systems do (Kafka
>> streams, Flink etc), in order to facilitate interactive queries?
>>
>> Regards,
>> Stavros
>>
>
>


Re: queryable state & streaming

2017-12-08 Thread Stavros Kontopoulos
Just to re-phrase my question: Would query-able state make a viable SPIP?

Regards,
Stavros

On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
st.kontopou...@gmail.com> wrote:

> Hi,
>
> Maybe this has been discussed before. Given the fact that many streaming
> apps out there use state extensively, could be a good idea to make Spark
> expose streaming state with an external API like other systems do (Kafka
> streams, Flink etc), in order to facilitate interactive queries?
>
> Regards,
> Stavros
>