Re: Tuning number of partitions per cache

2016-03-22 Thread Denis Magda

Dmitriy,

Agree that the 2nd point is less relevant if both partitions and data 
inside of the partitions are equally distributed.
However, let's suppose that everything is distributed uniformly and a 
new node joins a cluster.
Affinity function may assign it (?) primary and backup partitions in 
such a way that the new node will be preloading 30% of partitions from 
nodeA, 30% from nodeB and 40% from nodeC.
In this scenario seems that the size of a partition matters (10 GB vs 
100 GB vs 500 GB) because 10% partitions will be preloaded from a single 
point - nodeC thus affecting the total preloading time.


Is my understanding of a possibility of such scenario correct?

--
Denis

On 3/22/2016 10:09 AM, Dmitriy Setrakyan wrote:

Denis,

I agree that the number of partitions within a cache must be
*significantly* larger than the number of cluster nodes for that cache.

However, the 2nd point you are making is about controlling the size of
partitions, which in my view is much less relevant, as long as the 1st
requirement is met. Why should we worry if a partition size is 10GB or
100GB, as long as the number of partitions is equally distributed among
cluster nodes and the data is equally distributed among partitions?

D.

On Mon, Mar 21, 2016 at 9:33 PM, Denis Magda  wrote:


Igniters,

Let's say I know the following parameters of my system and cluster:
- number of nodes and their CPUs;
- per node size and total size;
- number of caches;
- number of entries in the caches;
- network bandwidth.

And I want to tune a number of partitions per cache to gain much possible
performance of my cluster.

The first obvious thing we know is that the number of partitions mustn't
be less than the number of nodes.

Next possible suggestion is that if average partition size is measured in
tens/hundreds(?) of gigabytes and more then we should set more partitions
to reduce this size.
I have the following case in mind for this suggestion. Let's say we have
partition "10" which size is around 20 GB. If to increase the number of
partitions in a such a way that this 20 GB will be split among two or three
partitions located on different nodes then the rebalancing should happen
faster because the same amount of data will be preloaded from different
nodes rather than from a single one. Is my understanding correct? Am I
missing something?

Is anyone else have other suggestions in mind taking into account the
parameters from the list above?

--
Denis







Re: Tuning number of partitions per cache

2016-03-22 Thread Dmitriy Setrakyan
Denis,

I agree that the number of partitions within a cache must be
*significantly* larger than the number of cluster nodes for that cache.

However, the 2nd point you are making is about controlling the size of
partitions, which in my view is much less relevant, as long as the 1st
requirement is met. Why should we worry if a partition size is 10GB or
100GB, as long as the number of partitions is equally distributed among
cluster nodes and the data is equally distributed among partitions?

D.

On Mon, Mar 21, 2016 at 9:33 PM, Denis Magda  wrote:

> Igniters,
>
> Let's say I know the following parameters of my system and cluster:
> - number of nodes and their CPUs;
> - per node size and total size;
> - number of caches;
> - number of entries in the caches;
> - network bandwidth.
>
> And I want to tune a number of partitions per cache to gain much possible
> performance of my cluster.
>
> The first obvious thing we know is that the number of partitions mustn't
> be less than the number of nodes.
>
> Next possible suggestion is that if average partition size is measured in
> tens/hundreds(?) of gigabytes and more then we should set more partitions
> to reduce this size.
> I have the following case in mind for this suggestion. Let's say we have
> partition "10" which size is around 20 GB. If to increase the number of
> partitions in a such a way that this 20 GB will be split among two or three
> partitions located on different nodes then the rebalancing should happen
> faster because the same amount of data will be preloaded from different
> nodes rather than from a single one. Is my understanding correct? Am I
> missing something?
>
> Is anyone else have other suggestions in mind taking into account the
> parameters from the list above?
>
> --
> Denis
>
>
>


Tuning number of partitions per cache

2016-03-21 Thread Denis Magda

Igniters,

Let's say I know the following parameters of my system and cluster:
- number of nodes and their CPUs;
- per node size and total size;
- number of caches;
- number of entries in the caches;
- network bandwidth.

And I want to tune a number of partitions per cache to gain much 
possible performance of my cluster.


The first obvious thing we know is that the number of partitions mustn't 
be less than the number of nodes.


Next possible suggestion is that if average partition size is measured 
in tens/hundreds(?) of gigabytes and more then we should set more 
partitions to reduce this size.
I have the following case in mind for this suggestion. Let's say we have 
partition "10" which size is around 20 GB. If to increase the number of 
partitions in a such a way that this 20 GB will be split among two or 
three partitions located on different nodes then the rebalancing should 
happen faster because the same amount of data will be preloaded from 
different nodes rather than from a single one. Is my understanding 
correct? Am I missing something?


Is anyone else have other suggestions in mind taking into account the 
parameters from the list above?


--
Denis