Hi all,

I have had a good exchange on how someone would use clustered / remote 
listeners to do custom continuous query features.

I have a few questions and requests to make this fully and easily doable

## Value as bytes or as objects

Assuming a Hot Rod based usage and protobuf as the serialization layer. What 
are KeyValueFilter and Converter seeing?
I assume today the bytes are unmarshalled and the Java object is provided to 
these interfaces.
In a protobuf based storage, does that mean that the user must create the Java 
objects out of a protobuf compiler and deploy these classes in the classpath of 
each server node?
Alternatively, could we pass the raw protobuf data to the KeyValueFilter and 
Converter? They could read the relevant properties at no deserialization cost 
and with lss problems related to the classloader.

Thoughts?

## Synced listeners

In a transactional clustered listener marked as sync. Does the transaction 
commits and then waits for the relevant clustered listeners to proceed before 
returning the hand to the Tx client? Or is there something else going on?

## oldValue and newValue

I understand why the oldValue was not provided in the initial work. It requires 
to send more data across the network and at least double the number of values 
unmarshalled.
But for continuous queries, being able to compare the old and the new value is 
critical to reduce the number of events sent to the listener.

Imagine the following use case. A listener exposes the average age for a 
certain type of customer. You would implement it the following way.

1. Add a KeyValueFilter that
- upon creation, filter out the customers of the wrong type
- upon update, keep customers that 
    - *were* of the right time but no longer are
    - were not of the right type but now now *are*
    - remains of the right type and whose age has changed
- upon deletion, keep customers that *were* of the right type

2. Converter
In the converter, one could send the whole customer but it would be more 
efficient to only send the age of the customer as well as wether it is added to 
or removed from the matching customers
- upon creation, you send the customer age and mark it as addition
- upon deletion, you send the customer age and mark it as deletion
- upon update
    - if the customer was of the right type but no longer is, send the age as 
well as a deletion flag
    - if the customer was not of the right type but now is, send the age as 
well as an addition flag
   -  if the customer age has changed, send the difference with a modification 
flag

3. The listener then needs to keep the total sum of all ages as well as the 
total number of customers of the right type. Based on the sent events, it can 
adjust these two counters.

That requires us to be able to provide the old and new value to the 
KeyValueFilter and the Converter interface as well as the type of event 
(creation, update, deletion).

If you keep the existing interfaces and their data, the data send and the 
memory consumed becomes much much bigger. I leave it as an exercise but I think 
you need to:
- send *all* remove and update events regardless of the value (essentially no 
KeyValueFilter)
- in the listener, keep a list of *all* matching keys so that you know if a new 
event is about a data that was already matching your criteria or not and act 
accordingly.

BTW, you need the old and new value even if your listener returns actual 
matching results instead of an aggregation. More or less for the same reasons.

Continuous query is about the most important use case for remote and clustered 
listeners and I think we should address it properly and as efficiently as 
possible. Adding continuous query to Infinispan will then “simply” be a matter 
of agreeing on the query syntax and implement the predicates as smartly as 
possible.

With the use case I describe, I think the best approach is to merge the KVF and 
Converter into a single Listener like interface that is able to send or silence 
an event payload. But that’s guestimate.
Because oldValue / newValue implies an unmarshalling overhead we might want to 
make it an annotation based flag on the class that is executed on each node 
(somewhat similar to the settings hosted on @Listener).

## includeCurrentState and very narrow filtering

The existing approach is fine (send a create event notif for all existing keys 
and queue changes in the mean time) as long as the listener plans to consume 
most of these events.
But in case of a big data grid, with a lot of passivated entries, the cost 
would become non negligible.

An alternative approach is to first do a query matching the elements the 
listener is interested in and queue up the events until the query is fully 
processed. Can a listener access a cache and do a query? Should we offer such 
option in a more packaged way?

For a listener that is only interested in keys whose value city contains 
Springfield, Virginia, the gain would be massive.

## Remote listener and non Java HR clients

Does the API of non Java HR clients support the enlistements of listeners and 
attach registered keyValueFilter / Converter? Or is that planned? Just curious.

Emmanuel
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to