I can outline a use-case I have which may help define requirements for this
task. For context, I was originally going to try and address the below
use-case by disabling automatic rebalancing on a per-cache basis and use a
cluster-wide task to orchestrate manual rebalancing; however, this issue
sounds like it may provide a better approach.

I have caches setup for the sole purpose of routing data to nodes via a Data
Streamer. The logic in the streamer is simply to access a plugin on the data
node which exposes a processing pipeline and runs the received cache entries
through it. The data in this case is monitoring related and there is one
cache (or logical stream) per data type (f.e. logs, events, metrics).

The pipeline is composed of N services which are deployed as node singletons
and have a service filter which targets a particular cache. These services
can be deployed and un-deployed as processing requirements change or bugs
are fixed without requiring clients to know or care about it.

The catch here is that when nodes are added I don't want map partitions to
rebalance to a new node until I know all of the necessary services are
running, otherwise we may have a small window where data is processed
through a pipeline that isn't completely initialized yet which would result
in a data quality issue. Alternatively, I could have the pipeline raise an
error which would cause the streamer to retry, but I'd like this to be
handled more gracefully, if possible.

In addition, it will probably be the case were these caches eventually have
node filters so that we can isolate resources for these streams across
different computes. This means that, for example, if we add a node only for
metrics then deferring rebalancing should ideally only impact caches that
would get assigned to that node.

Going even further... so far we've talked about one cache which is used just
for streaming, but at least one of the services would create its own set of
caches as an in-memory storage layer which maintains an inverted index and
time series data for elements coming through the stream. The storage caches
in this case would only exist on nodes where the stream cache is and most of
the write activity to these caches would be local since they would use the
same affinity as the stream cache (if most writes were remote this wouldn't
scale well). So... these caches would need to rebalance at the same time in
order to minimize the possibility of additional network calls.

The main concern I have is how to avoid the race condition of another node
joining the topology _after_ it has been determined rebalancing should
happen, but _before_ rebalancing is triggered. If this is controlled on a
per-node (+cache) basis - as the ticket describes - it's probably a
non-issue, but it's definitely an issue if it's only on a per-cache basis.

-Nick



--
View this message in context: 
http://apache-ignite-developers.2346864.n4.nabble.com/Add-ability-to-enable-and-disable-rebalancing-per-node-tp17494p17529.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Reply via email to