On Sun, Jan 29, 2017 at 3:04 AM, Vadim Punski <vpun...@gmail.com> wrote:

> Hi,
>
> I'm new to Akka and CRDT, so take the below with a grain of salt.
> I'm trying to implement the application requirements with thousands of
> LWW-Maps, not related to each other.
> The implementation itself is not the main issue I have, but the internal
> "distributed data" gossip protocol implementation, I found interesting...
>
> I suppose the problem below is related to my understanding (or lack of
> understanding) of CRDT, or current implementation of distributed-data
> plugin...
>
> Let's say I have 10K not related to each other CRDT based data type
> instances, for example 10K LWW-Maps. From the application point of view,
> they may be out of sync in relation to each other, but each one should be
> eventually consistent and synced in a cluster, using Local Consistency. By
> reading distributed-data plugin source code, I've found that for
> LWW-Map Key created (as any other CRDT data type instance defined) "lives"
> in a Replicator.dataEntries map.
>
> Then, the gossip protocol, based on the tick interval, replicates once a
> time configured chunk size of data, but no more than 10 chunks at
> once(Replicator.gossipTo()?). Means, the whole dataEntries map is
> replicated, no matter how many LWW-Maps were changed, if at all. The
> replication is goes as sending the digests of all (!) LWW-Maps (entries in
> dataEntries, using Status message) by gossip initiating node, and then
> requesting the differences and sending them back, if such differences were
> found (Gossip, one more round trip).
> That means, even with zero changes to my 10K LWW-Map entries, with few
> entries per each LWW-Map instance, 10K digests will be sent to all nodes
> each gossip tick fired. I see this behavior in a log...
>

It's sent to one random node per tick, not all nodes. The reason for
sending all digests is to be able to find entries that are not the same.
Remember that message delivery is at-most-once and there is no redelivery
and acking here to make it anything else. Also new nodes joining the
cluster need to sync up.


> This is the flow seen in 2.4.16 release version.
>
> In 2.5.0-SNAPSHOT, delta CRDT approach is presented. Although the
> documentation and the article mentioned talk about deltas changes of
> different CRDT data types (ex.: c, d elements in a set), the replication
> mechanism works on key->dataEnvelope level, means on the whole changed
> entry of dataEntries map (deltaPropagationSelector.update(key.id, d)
> ...). Please notice, the replication is not performed as a sending delta
> changes of a single data type instance(changed entries of of particular Set
> for example), but as a whole Set replication. In the current
> implementation,  "delta" means that only changed instances will be
> replicated by gossip protocol, and not the actual delta of CRDT data type
> instance.
>

Not sure I follow. Note that delta-CRDT support is only implemented for
GCounter and PNCounter so far. It's not implemented for LWW-Map yet, but
that should be possible so that only the changed entries in the LWW-Map are
sent as the delta.


>
> I spotted this behavior when having 1K different non related LWW-Maps with
> different key names, the system initiated sending of 1K digest signatures
> to all nodes without single change (in 2.4.16), or, having single map entry
> changed will fire the same flow anyway.
>
> 2.5.0-SNAPSHOT version should improve this approach, but, yet, the delta
> message is not built on a single CRDT type instance, but the all entries
> from dataEntires map there were changed, even if a single entry was changed
> in a 10 entries LWW-Map.
>
> So based on my understanding, my questions are:
> - in 2.4.16, what is the rationale to treat non related CRDT instances in
> dataEntries map as one composite CRDT, and replicate them as one single
> structure, and not as non-delta replication, but for each instance
> separately? If my approach is wrong, what is the way to do it anyway?
> - in 2.5.0, why the delta mechanism works on dataEntries map level, but
> the documentation and the article talk about delta of a single CRDT data
> type instance (members of CRDT Set) ? Is my understanding correct? If so,
> what is the reason for the current implementation?
> - What is the best way to implement large amount of non-related CRDT data
> type instances, Local Consistency, and reduce the replication overhead?
>
>
> Current implementation creates huge amount of inter-node traffic and cpu
> overhead, and even for less than 100K entries is pretty heavy. I don't
> really understand how the "shopping cart" like example (in case of Local
> Consistency!) with simultaneously logged-in 10K users with 10K shopping
> carts replicated each gossip interval  (digest round trip saves a lot, but
> still) should work in real life. On my opinion, working with Local
> Consistency should improve performance in comparison with any other
> consistency strategy, but it looks like for large amount of CRDT data type
> instances, it's not the case in my performance tests ...
>

http://doc.akka.io/docs/akka/2.5-M1/scala/distributed-data.html#Limitations

How large are your Status messages (with the digests)? You can reduce the
chunk size or increase the tick interval, at the cost of longer
dissemination times.


>
>
> P.S. I also found that "Subscribe" mechanism uses a kind of "delta"
> approach even in 2.4.16, as it saves all changed entries in
> "Replicator.changed" set for later notifications. So, the delta mechanism
> per entries of dataEntries map level exists, but only for notifications...
>
>
> Vadim Punski
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Akka Tech Lead
Lightbend <http://www.lightbend.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to