Re: Redis as a State Backend

2024-02-14 Thread David Morávek
Here is my "little harsh/straightforward feedback", but it's based on fact
and real-world experience with using Redis since ~2012.

Redis is not a database, period. The best description of what Redis is is
something along the lines of "in-memory - text only (base64 ftw) - data
structures on top of TCP socket". The sweet spot for Redis is the in-memory
caching layer (Memcache is the closest equivalent). The misconceptions
around Redis have haunted me for the past decade.

Redis is not providing any good primitives for participating in the
checkpointing / fault-tolerance mechanism, beyond what can be implemented
natively in the heap-state backend. It can do a full snapshot of a database
(costly, we need incremental, ...) or a text-based append log (changelog
state backend).

All data needs to fit in memory. In text form. No compression (you can of
course compress before doing base64).

We should not even bother comparing it to RocksDB, its Flink equivalent is
the HeapStateBackend, which could be made way more performant because it
eliminates the need of going through the network stack.

Best,
D.

On Wed, Jan 31, 2024 at 5:27 PM David Anderson  wrote:

> When it comes to decoupling the state store from Flink, I suggest taking a
> look at FlinkNDB, which is an experimental state backend for Flink that
> puts the state into an external distributed database. There's a Flink
> Forward talk [1] and a master's thesis [2] available.
>
> [1] https://www.youtube.com/watch?v=ZWq_TzsXssM
> [2] http://www.diva-portal.org/smash/get/diva2:1536373/FULLTEXT01.pdf
>
>
>
>
>
>
> On Wed, Jan 31, 2024 at 12:30 AM Chirag Dewan via user <
> user@flink.apache.org> wrote:
>
>> Thanks Zakelly and Junrui.
>>
>> I was actually exploring RocksDB as a state backend and I thought maybe
>> Redis could offer more features as a state backend. For e.g. maybe state
>> sharing between operators, geo-red of state, partitioning etc. I understand
>> these are not native use cases for Flink, but maybe something that can be
>> considered in future. Maybe even as an off the shelf state backend
>> framework which allows embedding any other cache as a state backend.
>>
>> The links you shared are useful and will really help me. Really
>> appreciate it.
>>
>> Thanks
>>
>> On Tuesday, 30 January, 2024 at 01:43:14 pm IST, Zakelly Lan <
>> zakelly@gmail.com> wrote:
>>
>>
>> And I found some previous discussion, FYI:
>> 1. https://issues.apache.org/jira/browse/FLINK-3035
>> 2. https://www.mail-archive.com/dev@flink.apache.org/msg10666.html
>>
>> Hope this helps.
>>
>> Best,
>> Zakelly
>>
>> On Tue, Jan 30, 2024 at 4:08 PM Zakelly Lan 
>> wrote:
>>
>> Hi Chirag
>>
>> That's an interesting idea. IIUC, storing key-values can be simply
>> implemented for Redis, but supporting checkpoint and recovery is relatively
>> challenging. Flink's checkpoint should be consistent among all stateful
>> operators at the same time. For an *embedded* and *file-based* key value
>> store like RocksDB, it is easier to implement by uploading files of
>> specific time asynchronously.
>>
>> Moreover if you want to store your state basically in memory, then why
>> not using the HashMapStateBackend. It saves the overhead of serialization
>> and deserialization and may achieve better performance compared with Redis
>> I guess.
>>
>>
>> Best,
>> Zakelly
>>
>> On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user <
>> user@flink.apache.org> wrote:
>>
>> Hi,
>>
>> I was looking at the FLIP-254: Redis Streams Connector and I was
>> wondering if Flink ever considered Redis as a state backend? And if yes,
>> why was it discarded compared to RocksDB?
>>
>> If someone can point me towards any deep dives on why RocksDB is a better
>> fit as a state backend, it would be helpful.
>>
>> Thanks,
>> Chirag
>>
>>


Re: Redis as a State Backend

2024-01-31 Thread David Anderson
When it comes to decoupling the state store from Flink, I suggest taking a
look at FlinkNDB, which is an experimental state backend for Flink that
puts the state into an external distributed database. There's a Flink
Forward talk [1] and a master's thesis [2] available.

[1] https://www.youtube.com/watch?v=ZWq_TzsXssM
[2] http://www.diva-portal.org/smash/get/diva2:1536373/FULLTEXT01.pdf






On Wed, Jan 31, 2024 at 12:30 AM Chirag Dewan via user <
user@flink.apache.org> wrote:

> Thanks Zakelly and Junrui.
>
> I was actually exploring RocksDB as a state backend and I thought maybe
> Redis could offer more features as a state backend. For e.g. maybe state
> sharing between operators, geo-red of state, partitioning etc. I understand
> these are not native use cases for Flink, but maybe something that can be
> considered in future. Maybe even as an off the shelf state backend
> framework which allows embedding any other cache as a state backend.
>
> The links you shared are useful and will really help me. Really appreciate
> it.
>
> Thanks
>
> On Tuesday, 30 January, 2024 at 01:43:14 pm IST, Zakelly Lan <
> zakelly@gmail.com> wrote:
>
>
> And I found some previous discussion, FYI:
> 1. https://issues.apache.org/jira/browse/FLINK-3035
> 2. https://www.mail-archive.com/dev@flink.apache.org/msg10666.html
>
> Hope this helps.
>
> Best,
> Zakelly
>
> On Tue, Jan 30, 2024 at 4:08 PM Zakelly Lan  wrote:
>
> Hi Chirag
>
> That's an interesting idea. IIUC, storing key-values can be simply
> implemented for Redis, but supporting checkpoint and recovery is relatively
> challenging. Flink's checkpoint should be consistent among all stateful
> operators at the same time. For an *embedded* and *file-based* key value
> store like RocksDB, it is easier to implement by uploading files of
> specific time asynchronously.
>
> Moreover if you want to store your state basically in memory, then why not
> using the HashMapStateBackend. It saves the overhead of serialization and
> deserialization and may achieve better performance compared with Redis I
> guess.
>
>
> Best,
> Zakelly
>
> On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user <
> user@flink.apache.org> wrote:
>
> Hi,
>
> I was looking at the FLIP-254: Redis Streams Connector and I was
> wondering if Flink ever considered Redis as a state backend? And if yes,
> why was it discarded compared to RocksDB?
>
> If someone can point me towards any deep dives on why RocksDB is a better
> fit as a state backend, it would be helpful.
>
> Thanks,
> Chirag
>
>


Re: Redis as a State Backend

2024-01-31 Thread Chirag Dewan via user
 Thanks Zakelly and Junrui.
I was actually exploring RocksDB as a state backend and I thought maybe Redis 
could offer more features as a state backend. For e.g. maybe state sharing 
between operators, geo-red of state, partitioning etc. I understand these are 
not native use cases for Flink, but maybe something that can be considered in 
future. Maybe even as an off the shelf state backend framework which allows 
embedding any other cache as a state backend. 
The links you shared are useful and will really help me. Really appreciate it.
Thanks
On Tuesday, 30 January, 2024 at 01:43:14 pm IST, Zakelly Lan 
 wrote:  
 
 And I found some previous discussion, FYI:1. 
https://issues.apache.org/jira/browse/FLINK-3035
2. https://www.mail-archive.com/dev@flink.apache.org/msg10666.html

Hope this helps.
Best,Zakelly
On Tue, Jan 30, 2024 at 4:08 PM Zakelly Lan  wrote:

Hi Chirag
That's an interesting idea. IIUC, storing key-values can be simply implemented 
for Redis, but supporting checkpoint and recovery is relatively challenging. 
Flink's checkpoint should be consistent among all stateful operators at the 
same time. For an embedded and file-based key value store like RocksDB, it is 
easier to implement by uploading files of specific time asynchronously.
Moreover if you want to store your state basically in memory, then why not 
using the HashMapStateBackend. It saves the overhead of serialization and 
deserialization and may achieve better performance compared with Redis I guess.

Best,Zakelly
On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user  
wrote:

Hi,
I was looking at the FLIP-254: Redis Streams Connector and I was wondering if 
Flink ever considered Redis as a state backend? And if yes, why was it 
discarded compared to RocksDB? 
If someone can point me towards any deep dives on why RocksDB is a better fit 
as a state backend, it would be helpful.
Thanks,Chirag 

  

Re: Redis as a State Backend

2024-01-30 Thread Zakelly Lan
And I found some previous discussion, FYI:
1. https://issues.apache.org/jira/browse/FLINK-3035
2. https://www.mail-archive.com/dev@flink.apache.org/msg10666.html

Hope this helps.

Best,
Zakelly

On Tue, Jan 30, 2024 at 4:08 PM Zakelly Lan  wrote:

> Hi Chirag
>
> That's an interesting idea. IIUC, storing key-values can be simply
> implemented for Redis, but supporting checkpoint and recovery is relatively
> challenging. Flink's checkpoint should be consistent among all stateful
> operators at the same time. For an *embedded* and *file-based* key value
> store like RocksDB, it is easier to implement by uploading files of
> specific time asynchronously.
>
> Moreover if you want to store your state basically in memory, then why not
> using the HashMapStateBackend. It saves the overhead of serialization and
> deserialization and may achieve better performance compared with Redis I
> guess.
>
>
> Best,
> Zakelly
>
> On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user <
> user@flink.apache.org> wrote:
>
>> Hi,
>>
>> I was looking at the FLIP-254: Redis Streams Connector and I was
>> wondering if Flink ever considered Redis as a state backend? And if yes,
>> why was it discarded compared to RocksDB?
>>
>> If someone can point me towards any deep dives on why RocksDB is a better
>> fit as a state backend, it would be helpful.
>>
>> Thanks,
>> Chirag
>>
>


Re: Redis as a State Backend

2024-01-30 Thread Zakelly Lan
Hi Chirag

That's an interesting idea. IIUC, storing key-values can be simply
implemented for Redis, but supporting checkpoint and recovery is relatively
challenging. Flink's checkpoint should be consistent among all stateful
operators at the same time. For an *embedded* and *file-based* key value
store like RocksDB, it is easier to implement by uploading files of
specific time asynchronously.

Moreover if you want to store your state basically in memory, then why not
using the HashMapStateBackend. It saves the overhead of serialization and
deserialization and may achieve better performance compared with Redis I
guess.


Best,
Zakelly

On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user 
wrote:

> Hi,
>
> I was looking at the FLIP-254: Redis Streams Connector and I was
> wondering if Flink ever considered Redis as a state backend? And if yes,
> why was it discarded compared to RocksDB?
>
> If someone can point me towards any deep dives on why RocksDB is a better
> fit as a state backend, it would be helpful.
>
> Thanks,
> Chirag
>


Re: Redis as a State Backend

2024-01-29 Thread Junrui Lee
Hi Chirag,

Indeed, the possibility of using Redis as a state backend for Flink has
been considered in the past. You can find a detailed discussion about this
topic in the JIRA issue FLINK-3035[1] as well as in the comments section of
this PR[2].

The outcome of these discussions was that Redis is not well suited to serve
as a state backend for Flink.

Best regards,
Junrui

[1] https://issues.apache.org/jira/browse/FLINK-3035
[2] https://github.com/apache/flink/pull/1617

Chirag Dewan via user  于2024年1月30日周二 14:15写道:

> Hi,
>
> I was looking at the FLIP-254: Redis Streams Connector and I was
> wondering if Flink ever considered Redis as a state backend? And if yes,
> why was it discarded compared to RocksDB?
>
> If someone can point me towards any deep dives on why RocksDB is a better
> fit as a state backend, it would be helpful.
>
> Thanks,
> Chirag
>