This is a great blog that explains how data is distributed in an Ignite cluster:
https://www.gridgain.com/resources/blog/data-distribution-in-apache-ignite
Data Distribution in Apache Ignite
gridgain.com
> On 1 Mar 2023, at 18:40, John Smith wrote:
>
> My key is phone_number and they are all
My key is phone_number and they are all unique... I'll check with the
command...
On Wed., Mar. 1, 2023, 11:20 a.m. Stephen Darlington, <
stephen.darling...@gridgain.com> wrote:
> The streamer doesn’t determine where the data goes. It just efficiently
> sends it to the correct place.
>
> If your
The streamer doesn’t determine where the data goes. It just efficiently sends
it to the correct place.
If your data is skewed in some way so that there is more data in some
partitions than others, then you could find one machine with more work to do
than others. All else being equal, you’ll al
Ok thanks. I just thought the streamer would be more uniform.
On Wed, Mar 1, 2023 at 4:41 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:
> You might want to check the data distribution. You can use control.sh
> —cache distribution to do that.
>
> On 28 Feb 2023, at 20:32, John Sm
You might want to check the data distribution. You can use control.sh —cache
distribution to do that.
> On 28 Feb 2023, at 20:32, John Smith wrote:
>
> The last thing I can add to clarify is, the 3 node cluster is a centralized
> cluster and the CSV loader is a thick client running on its own
The last thing I can add to clarify is, the 3 node cluster is a centralized
cluster and the CSV loader is a thick client running on its own machine.
On Tue, Feb 28, 2023 at 2:52 PM John Smith wrote:
> Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE
> GROUP BY COLUMN_2;
Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE
GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes and returns
in a respectable manager.
So not sure whats going on but with the data streamer I guess most of the
writes are pushed to THE ONE node mostly and th
Hi so I'm using it in a pretty straight forward kind of way at least I
think...
I'm loading 35 million lines from CSV to an SQL table. Decided to use
streamer as I figured it would still be allot faster than batching SQL
INSERTS.
I tried with backup=0 and backup=1 (Prefer to have backup on)
1- Wit
Have you tried tracing the workload on the 100% and 40% nodes for
comparison? There just isn't enough detail in your question to help predict
what should be happening with the cluster workload. For a starting point,
please identify your design goals. It's easy to get confused by advice that
seeks t
Hi I'm using the data streamer to insert into a 3 cluster node. I have
noticed that 1 node is pegging at 100% cpu while the others are at 40ish %.
Is that normal?
10 matches
Mail list logo