Re: Designing Affinity Key for more locality

2021-04-29 Thread William.L
I am using a user centric modeling approach where most of the computations
would be on a per-user basis (joins) before aggregation. The idea is to put
the data (across different tables/caches) for the same user in the same
partition/server. That's the reason why I chose user-id as the affinity key.

Using tenant/group as the affinity key is not good scalability. Some
tenant/group dataset might be too large for one partition/server (we are
using persistence mode). Even if it does fit, it would not benefit from load
balancing of the computation across the servers.








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.10. Performance tests in Azure

2021-04-29 Thread Stephen Darlington
Neither getAll not putAll are really designed for many thousands of 
reads/writes in a single operation. The whole operation (rather than individual 
rows) are atomic.

For writes, you can use CacheAffinity to work out which node each record will 
be stored on and bulk push updates to the specific node (as the blog I posted 
previously suggests).

For reads, Iā€™d expect to see scan and SQL queries. Or ā€” better ā€” using 
colocated compute to avoid copying the data over the network.

Ignite does scale horizontally but you have to use the right approach to get 
the best performance.

> On 29 Apr 2021, at 08:55, barttanghe  wrote:
> 
> Stephen,
> 
> I was in the understanding that if the cache is atomic (which is one of the
> cases jjimeno tried), that there are no transactions involved and that the
> putAll in fact is working on a row by row basis (some can fail).
> 
> So I don't understand what you mean with 'you're effectively creating a huge
> transaction'.
> Is this something internal to Ignite (so not user-space ?)?
> Could you help me in understanding this?
> 
> Next to that, what you explain is about putAll (writing case), but the
> getAll (reading case) also seems to already have reached its limits at only
> 16 nodes. Any ideas on that one?
> 
> Thanks!
> 
> Bart
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/




Re: Ignite 2.10. Performance tests in Azure

2021-04-29 Thread barttanghe
Stephen,

I was in the understanding that if the cache is atomic (which is one of the
cases jjimeno tried), that there are no transactions involved and that the
putAll in fact is working on a row by row basis (some can fail).

So I don't understand what you mean with 'you're effectively creating a huge
transaction'.
Is this something internal to Ignite (so not user-space ?)?
Could you help me in understanding this?

Next to that, what you explain is about putAll (writing case), but the
getAll (reading case) also seems to already have reached its limits at only
16 nodes. Any ideas on that one?

Thanks!

Bart



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Understanding SQL join performance

2021-04-29 Thread Taras Ledkov

Hi,

Unfortunately I don't understand the root of the problem totally.
Looks like the performance depends linear on rows count:
10k ~ 0.1s
65k ~ 0.65s
650k ~ 7s
I see the linear dependency on rows count...

> Is Ignite doing the join and filtering at each data node and then 
sending

> the 650K total rows to the reduce before aggregation?

Which aggregation do you mean?
Please provide the query plan and data schema for details.

On 24.04.2021 3:24, William.L wrote:

Hi,

I am trying to understand why my colocated join between two tables/caches
are taking so long compare to the individual table filters.

TABLE1

Returns 1 count -- 0.13s

TABLE2

Returns 65000 count -- 0.643s


 JOIN TABLE1 and TABLE2

Returns 650K count -- 7s

Both analysis_input and analysis_output has index on (cohort_id, user_id,
timestamp). The affinity key is user_id. How do I analyze the performance
further?

Here's the explain which does not tell me much:



Is Ignite doing the join and filtering at each data node and then sending
the 650K total rows to the reduce before aggregation? If so, is it possible
for Ignite to do the some aggregation at the data node first and then send
the first level aggregation results to the reducer?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Taras Ledkov
Mail-To: tled...@gridgain.com