subject:"questions"

Questions about H2 / Calcite limitations

2024-03-13 Thread BELMONTE Loic via user

Hello,

I encountered some errors while trying to build an SQL request and I would 
appreciate your input on it.

Using H2 engine, I tried to have a subquery inside a subquery and there is an 
exception about unique alias being null.
Something like:
select ... from ( select ... from ( select ... from table) )
Is this a limitation on the number of subqueries? a bug?
I'm doing this to be able to reuse formulas, I don't really need subqueries, is 
there a better way to do that?

Using Calcite engine, this limitation does not seem to exist, but I had another 
exception (without much details except that it failed) when trying to do more 
than one join with subqueries:
with sq1 as ( select * from ( values ... ) )
with sq2 as ( select * from ( values ... ) )
select ... from table
left join sq1 on ...
left join sq2 on ...
Am I doing something wrong?

Also, is there a timeline for Calcite official release (out of beta, I cannot 
find the info)?

Thanks in advance,
Loïc
This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary, consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Apache Ignite 3.0 Questions

2023-05-26 Thread Pavel Tupitsyn

1) There is no set date for now
2) Maksim is right, 2.x will live on for a long time
3) Yes, it is in the works

On Fri, May 26, 2023 at 3:01 PM Maksim Timonin 
wrote:

> Hi David,
>
>  2) What is the end of life matrix or end of support of 2.0?
>
>
> There is a stable community of great engineers that improve and support
> Ignite 2. Support of Ignite 2 will not be terminated in the foreseeable
> future.
>
> On Fri, May 26, 2023 at 1:04 PM David Bucek  wrote:
>
>> Hello,
>>
>> we are considering to use Apache Ignite for our software and we have
>> couple of questions:
>>
>> 1) When will Apache Ignite 3.0 be the stable version?
>> 2) What is the end of life matrix or end of support of 2.0?
>> 3) Will the 3.0 version still be embeddable to the Java application?
>> (very useful for us).
>>
>> Thank you for answers
>>
>>
>> David Bucek | Software Architect | MANTA
>>
>> *This email is intended solely for the addressee(s) and all its contents,
>> including all attachments and files transmitted with it, represent
>> confidential information. Unauthorized distribution, modification or
>> disclosure of its contents and unauthorized reliance on its contents are
>> prohibited. If you have received this email in error, please notify the
>> sender immediately by return email. Please then delete the email from your
>> system and do not (i) copy or distribute it, (ii) rely on its contents, or
>> (iii) disclose its contents to any person.*
>
>

Re: Apache Ignite 3.0 Questions

2023-05-26 Thread Maksim Timonin

Hi David,

 2) What is the end of life matrix or end of support of 2.0?


There is a stable community of great engineers that improve and support
Ignite 2. Support of Ignite 2 will not be terminated in the foreseeable
future.

On Fri, May 26, 2023 at 1:04 PM David Bucek  wrote:

> Hello,
>
> we are considering to use Apache Ignite for our software and we have
> couple of questions:
>
> 1) When will Apache Ignite 3.0 be the stable version?
> 2) What is the end of life matrix or end of support of 2.0?
> 3) Will the 3.0 version still be embeddable to the Java application? (very
> useful for us).
>
> Thank you for answers
>
>
> David Bucek | Software Architect | MANTA
>
> *This email is intended solely for the addressee(s) and all its contents,
> including all attachments and files transmitted with it, represent
> confidential information. Unauthorized distribution, modification or
> disclosure of its contents and unauthorized reliance on its contents are
> prohibited. If you have received this email in error, please notify the
> sender immediately by return email. Please then delete the email from your
> system and do not (i) copy or distribute it, (ii) rely on its contents, or
> (iii) disclose its contents to any person.*

Fwd: Apache Ignite 3.0 Questions

2023-05-26 Thread David Bucek

Hello,

we are considering to use Apache Ignite for our software and we have couple
of questions:

1) When will Apache Ignite 3.0 be the stable version?
2) What is the end of life matrix or end of support of 2.0?
3) Will the 3.0 version still be embeddable to the Java application? (very
useful for us).

Thank you for answers


David Bucek | Software Architect | MANTA

-- 
*This email is intended solely for the addressee(s) and all its contents, 
including all attachments and files transmitted with it, represent 
confidential information. Unauthorized distribution, modification or 
disclosure of its contents and unauthorized reliance on its contents are 
prohibited. If you have received this email in error, please notify the 
sender immediately by return email. Please then delete the email from your 
system and do not (i) copy or distribute it, (ii) rely on its contents, or 
(iii) disclose its contents to any person.*

Re: Please help: several questions for Ignite

2021-12-01 Thread yonghua

‌That clarity some questions for me too. thanks.

De : "Pavel Tupitsyn"
A : "user"
Envoyé: mercredi 1 Décembre 2021 21:05
Objet : Re: Please help: several questions for Ignite

Hi Jon,

#1 Probably yes, K/V and SQL are among the most used features.

Yes, Ignite can be used instead of Redis as a distributed cache in some
use cases. The API is different though.

#2 Those features are production-ready. Ignite is not based on Spark.

#3 Compute API has map/reduce functionality:
https://ignite.apache.org/docs/latest/distributed-computing/map-reduce

Grouping and filtering can be achieved based on that.

Alternatively, use the SQL engine which performs map/reduce under the hood.

#4 I'd say the choice between SQL and K/V is about two things - convenience and
performance.

K/V API maps the data to your classes, and it is generally faster than SQL
for individual key operations (get, put, replace).

On the other hand, SQL with proper indexes is faster for complex queries.

#5 Please check
https://ignite.apache.org/docs/latest/extensions-and-integrations/ignite-for-spark/ignite-dataframe

Pavel

On Wed, Dec 1, 2021 at 1:48 PM Jon Hua wrote:

Hi community

Today I spent a whole day reading the docs:

https://ignite.apache.org/docs/latest/

This is a well-written documentation for Ignite, thanks for the work.

I have several questions that:

#1, Is the most used feature of Ignite the distributed K/V storage? Can I treat
it as the distributed Redis?

#2, It says it supports streaming, distributed computing, ML Lib. Are they
affected by Apache Spark? Are these three features production ready?

#3, I saw that distributed computing has very few API methods. Will you expand
them later? For example, map(), reduce(), group(), filter() etc.

#4, The document says SQL and K/V are essentially the same stuff. So when to
use SQL and when to use K/V interface?

#5. Will you support dataframe in future? Yes, both Spark and R have the
dataframe. The structure is quite easy to load outside data such as CSV, JSON
etc.

Thank you in advance for any help.

Regards

Jon Hua

Re: Please help: several questions for Ignite

2021-12-01 Thread Pavel Tupitsyn

Hi Jon,

#1 Probably yes, K/V and SQL are among the most used features.
 Yes, Ignite can be used instead of Redis as a distributed cache in
some use cases. The API is different though.

#2 Those features are production-ready. Ignite is not based on Spark.

#3 Compute API has map/reduce functionality:
https://ignite.apache.org/docs/latest/distributed-computing/map-reduce
 Grouping and filtering can be achieved based on that.
 Alternatively, use the SQL engine which performs map/reduce under the
hood.

#4 I'd say the choice between SQL and K/V is about two things - convenience
and performance.
 K/V API maps the data to your classes, and it is generally faster than
SQL for individual key operations (get, put, replace).
 On the other hand, SQL with proper indexes is faster for complex
queries.

#5 Please check
https://ignite.apache.org/docs/latest/extensions-and-integrations/ignite-for-spark/ignite-dataframe

Pavel

On Wed, Dec 1, 2021 at 1:48 PM Jon Hua  wrote:

> Hi community
>
> Today I spent a whole day reading the docs:
> https://ignite.apache.org/docs/latest/
>
> This is a well-written documentation for Ignite, thanks for the work.
> I have several questions that:
>
> #1, Is the most used feature of Ignite the distributed K/V storage? Can I
> treat it as the distributed Redis?
> #2, It says it supports streaming, distributed computing, ML Lib. Are they
> affected by Apache Spark? Are these three features production ready?
> #3, I saw that distributed computing has very few API methods. Will you
> expand them later? For example, map(), reduce(), group(), filter() etc.
> #4, The document says SQL and K/V are essentially the same stuff. So when
> to use SQL and when to use K/V interface?
> #5. Will you support dataframe in future? Yes, both Spark and R have the
> dataframe. The structure is quite easy to load outside data such as CSV,
> JSON etc.
>
> Thank you in advance for any help.
>
> Regards
> Jon Hua
>

Please help: several questions for Ignite

2021-12-01 Thread Jon Hua

Hi community

Today I spent a whole day reading the docs:
https://ignite.apache.org/docs/latest/

This is a well-written documentation for Ignite, thanks for the work.
I have several questions that:

#1, Is the most used feature of Ignite the distributed K/V storage? Can I
treat it as the distributed Redis?
#2, It says it supports streaming, distributed computing, ML Lib. Are they
affected by Apache Spark? Are these three features production ready?
#3, I saw that distributed computing has very few API methods. Will you
expand them later? For example, map(), reduce(), group(), filter() etc.
#4, The document says SQL and K/V are essentially the same stuff. So when
to use SQL and when to use K/V interface?
#5. Will you support dataframe in future? Yes, both Spark and R have the
dataframe. The structure is quite easy to load outside data such as CSV,
JSON etc.

Thank you in advance for any help.

Regards
Jon Hua

Re: Questions about baseline auto-just

2021-09-27 Thread Ilya Korol

Hi,

It looks like baseline topology auto-adjustment is in process. Did you 
checked cluster state later?
Also to track auto-adjustment events please check your logs for messages 
like:

Baseline auto-adjust will be executed right now ...
Baseline auto-adjust will be executed in ...
Baseline auto adjust data is expired (will not be scheduled) [data= ...
Baseline auto adjust data is targeted to obsolete version (will not be 
scheduled) ...

New baseline timeout object was ( successfully scheduled / rejected ) ...

On 2021/09/26 10:17:24, xin  wrote:
>
> Hi All,>
>    We are using Ignite 2.10.0 and we have a question baseline 
auto-just.>

>
> As the picture shows,cluster state is active,baseline auto adjustment 
enabled,softTimeout=1>

>
>
> When a node leaves the cluster for 10 seconds, BaselineTopology 
didn't change，no node joins or leaves during this period.>

> Why the parameter does not take effect?>
> ->
>
> Thanks & Regards,>
> Xin Chang>
>
>
> 从 Windows 版邮件发送>
>
>

Re[2]: Questions related to check pointing

2021-01-12 Thread Zhenya Stanilovsky

; 
>>>>>After reviewing our logs I found this: (one example)
>>>>> 
>>>>>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>>>>>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>>>>>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>>>>>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>>>>>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>>>>>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>>>>>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty 
>>>>>pages ']   
>>>>> 
>>>>>Which suggests we may have the issue where writes are frozen until the 
>>>>>check point is completed.
>>>>> 
>>>>>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears 
>>>>>to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>>>>> 
>>>>>/**
>>>>> * Threshold to calculate limit for pages list on-heap caches.
>>>>> * 
>>>>> * Note: When a checkpoint is triggered, we need some amount of page 
>>>>>memory to store pages list on-heap cache.
>>>>> * If a checkpoint is triggered by "too many dirty pages" reason and 
>>>>>pages list cache is rather big, we can get
>>>>>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the 
>>>>>total amount of cached page list buckets,
>>>>> * assuming that checkpoint will be triggered if no more then 3/4 of 
>>>>>pages will be marked as dirty (there will be
>>>>> * at least 1/4 of clean pages) and each cached page list bucket can 
>>>>>be stored to up to 2 pages (this value is not
>>>>> * static, but depends on PagesCache.MAX_SIZE, so if 
>>>>>PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>>> * more than 2 pages). Also some amount of page memory needed to store 
>>>>>page list metadata.
>>>>> */
>>>>> private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  
>>>>>=  0.1 ;
>>>>> 
>>>>>This raises two questions: 
>>>>> 
>>>>>1. The data region where most writes are occurring has 4Gb allocated to 
>>>>>it, though it is permitted to start at a much lower level. 4Gb should be 
>>>>>1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>>> 
>>>>>The 'limit holder' is calculated like this:
>>>>> 
>>>>>/**
>>>>> *  @return  Holder for page list cache limit for given data region.
>>>>> */
>>>>> public   AtomicLong   pageListCacheLimitHolder ( DataRegion   
>>>>>dataRegion ) {
>>>>> if  ( dataRegion . config (). isPersistenceEnabled ()) {
>>>>> return   pageListCacheLimits . computeIfAbsent ( dataRegion . 
>>>>>config (). getName (), name  ->   new   AtomicLong (
>>>>>( long )(((PageMemoryEx) dataRegion . pageMemory ()). 
>>>>>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>>>}  
>>>>> return   null ;
>>>>>}
>>>>> 
>>>>>... but I am unsure if totalPages() is referring to the current size of 
>>>>>the data region, or the size it is permitted to grow to. ie: Could the 
>>>>>'dirty page limit' be a sliding limit based on the growth of the data 
>>>>>region? Is it better to set the initial and maximum sizes of data regions 
>>>>>to be the same number?
>>>>> 
>>>>>2. We have two data regions, one supporting inbound arrival of data (with 
>>>>>low numbers of writes), and one supporting storage of processed results 
>>>>>from the arriving data (with many more writes). 
>>>>> 
>>>>>The block on writes due to the number of dirty pages appears to affect all 
>>>>>data regions, not just the one which has violated the dirty page limit. Is 
>>>>>that correct? If so, is this something that can be improved?
>>>>> 
>>>>>Thanks,
>>>>>Raymond.
>>>>>   
>>>>>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < 
>>>>>r

Re: Questions related to check pointing

2021-01-12 Thread Raymond Wilson

>  * 
>
>  * Note: When a checkpoint is triggered, we need some amount of page 
> memory to store pages list on-heap cache.
>
>  * If a checkpoint is triggered by "too many dirty pages" reason and 
> pages list cache is rather big, we can get
>
> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
> amount of cached page list buckets,
>
>  * assuming that checkpoint will be triggered if no more then 3/4 of 
> pages will be marked as dirty (there will be
>
>  * at least 1/4 of clean pages) and each cached page list bucket can be 
> stored to up to 2 pages (this value is not
>
>  * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
> > PagesListNodeIO#getCapacity it can take
>
>  * more than 2 pages). Also some amount of page memory needed to store 
> page list metadata.
>  */
> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>
> This raises two questions:
>
> 1. The data region where most writes are occurring has 4Gb allocated to
> it, though it is permitted to start at a much lower level. 4Gb should be
> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>
> The 'limit holder' is calculated like this:
>
> /**
>  * @return Holder for page list cache limit for given data region.
>  */
> public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
> if (dataRegion.config().isPersistenceEnabled()) {
> return pageListCacheLimits.computeIfAbsent(dataRegion.config
> ().getName(), name -> new AtomicLong(
> (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
> }
>
> return null;
> }
>
> ... but I am unsure if totalPages() is referring to the current size of
> the data region, or the size it is permitted to grow to. ie: Could the
> 'dirty page limit' be a sliding limit based on the growth of the data
> region? Is it better to set the initial and maximum sizes of data regions
> to be the same number?
>
> 2. We have two data regions, one supporting inbound arrival of data (with
> low numbers of writes), and one supporting storage of processed results
> from the arriving data (with many more writes).
>
> The block on writes due to the number of dirty pages appears to affect all
> data regions, not just the one which has violated the dirty page limit. Is
> that correct? If so, is this something that can be improved?
>
> Thanks,
> Raymond.
>
>
> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson  <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
> wrote:
>
> I'm working on getting automatic JVM thread stack dumping occurring if we
> detect long delays in put (PutIfAbsent) operations. Hopefully this will
> provide more information.
>
> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky  <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
>
> Don`t think so, checkpointing work perfectly well already before this fix.
> Need additional info for start digging your problem, can you share ignite
> logs somewhere?
>
>
>
> I noticed an entry in the Ignite 2.9.1 changelog:
>
>- Improved checkpoint concurrent behaviour
>
> I am having trouble finding the relevant Jira ticket for this in the 2.9.1
> Jira area at
> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>
> Perhaps this change may improve the checkpointing issue we are seeing?
>
> Raymond.
>
>
> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson  <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
> wrote:
>
> Hi Zhenya,
>
> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
> are not saturating the EFS interface. We use the default page size
> (experiments with larger page sizes showed instability when checkpointing
> due to free page starvation, so we reverted to the default size).
>
> 2. Thanks for the detail, we will look for that in thread dumps when we
> can create them.
>
> 3. We are using the default CP buffer size, which is max(256Mb,
> DataRagionSize / 4) according to the Ignite documentation, so this should
> have more than enough checkpoint buffer space to cope with writes. As
> additional information, the cache which is displaying very slow writes is
> in a data region with relatively slow write traffic. There is a primary
>

Re: Questions related to check pointing

2021-01-10 Thread Zhenya Stanilovsky



fsync=37104ms too long for such pages amount : pages=33421, plz check how can 
you improve fsync on your storage.

 
>
>
>--- Forwarded message ---
>From: "Raymond Wilson" < raymond_wil...@trimble.com >
>To: user < user@ignite.apache.org >, "Zhenya Stanilovsky" < arzamas...@mail.ru 
>>
>Cc:
>Subject: Re: Re[4]: Questions related to check pointing
>Date: Thu, 31 Dec 2020 01:46:20 +0300
> 
>Hi Zhenya,
> 
>The matching checkpoint finished log is this:
> 
>2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer]  Checkpoint 
>finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, 
>markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, 
>pagesWrite=1150ms, fsync=37104ms, total=38571ms]  
> 
>Regards your comment that 3/4 of pages in whole data region need to be dirty 
>to trigger this, can you confirm this is 3/4 of the maximum size of the data 
>region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and 
>used is 2Gb, would 1.5Gb of dirty pages trigger this?)
> 
>Are data regions independently checkpointed, or are they checkpointed as a 
>whole, so that a 'too many dirty pages' condition affects all data regions in 
>terms of write blocking?
> 
>Can you comment on my query regarding should we set Min and Max size of the 
>data region to be the same? Ie: Don't bother with growing the data region 
>memory use on demand, just allocate the maximum?  
> 
>In terms of the checkpoint lock hold time metric, of the checkpoints quoting 
>'too many dirty pages' there is one instance apart from the one I have 
>provided earlier violating this limit, ie:
> 
>2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint 
>started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, 
>startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], 
>checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, 
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, 
>walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, 
>splitAndSortCpPagesDuration=276ms, pages=4, reason=' too many dirty pages 
>']  
> 
>This is out of a population of 16 instances I can find. The remainder have 
>lock times of 16-17ms.
> 
>Regarding writes of pages to the persistent store, does the check pointing 
>system parallelise writes across partitions ro maximise throughput? 
> 
>Thanks,
>Raymond.
> 
>   
>On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky < arzamas...@mail.ru > 
>wrote:
>>
>>All write operations will be blocked for this timeout :  
>>checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount 
>>of such messages :    reason=' too many dirty pages ' may be you need to 
>>store some data in not persisted regions for example or reduce indexes (if 
>>you use them). And please attach other part of cp message starting with : 
>>Checkpoint finished.
>>
>>
>> 
>>>In ( 
>>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>> ), there is a mention of a dirty pages limit that is a factor that can 
>>>trigger check points.
>>> 
>>>I also found this issue:  
>>>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>> 
>>>After reviewing our logs I found this: (one example)
>>> 
>>>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>>>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>>>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>>>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>>>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>>>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>>>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages 
>>>']   
>>> 
>>>Which suggests we may have the issue where writes are frozen until the check 
>>>point is completed.
>>> 
>>>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears 
>>>to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>>> 
>>>/**
>>> * Threshold to calculate limit for pages list on-heap caches.
>>> * 
>>> * Note: When a checkpoint is triggered, we need some amount of page 
>>>memory to store pages list on-heap cache.
>>

Re: Re[4]: Questions related to check pointing

2021-01-07 Thread Ilya Kasnacheev

Hello!

I think it's a sensible explanation.

Regards,
-- 
Ilya Kasnacheev


ср, 6 янв. 2021 г. в 14:32, Raymond Wilson :

> I checked our code that creates the primary data region, and it does set
> the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
> that region.
>
> The secondary data region is much smaller, and is set to min/max = 128 Mb
> of memory.
>
> The checkpoints with the "too many dirty pages" reason were quoting less
> than 100,000 dirty pages, so this must have been triggered on the size of
> the smaller data region.
>
> Both these data regions have persistence, and I think this may have been a
> sub-optimal way to set it up. My aim was to provide a dedicated channel for
> inbound data arriving to be queued that was not impacted by updates due to
> processing of that data. I think it may be better to will change this
> arrangement to use a single data region to make the checkpointing process
> simpler and reduce cases where it decides there are too many dirty pages.
>
> On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> I guess it's pool.pages() * 3L / 4
>> Since, counter intuitively, the default ThrottlingPolicy is not
>> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>>
>> Regards,
>>
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson :
>>
>>> Regards this section of code:
>>>
>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>>> ? pool.pages() * 3L / 4
>>> : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> I think the correct ratio will be 2/3 of pages as we do not have a
>>> throttling policy defined, correct?.
>>>
>>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky 
>>> wrote:
>>>
>>>> Correct code is running from here:
>>>>
>>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
>>>> safeToUpdatePageMemories() || checkpointer.runner() == null)
>>>> break;else {
>>>> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too 
>>>> many dirty pages");
>>>>
>>>> and near you can see that :
>>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED? 
>>>> pool.pages() * 3L / 4: Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>>
>>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>>> cp.
>>>>
>>>>
>>>> In (
>>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>>> check points.
>>>>
>>>> I also found this issue:
>>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>>
>>>> After reviewing our logs I found this: (one example)
>>>>
>>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer]
>>>> Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>>> pages']
>>>>
>>>> Which suggests we may have the issue where writes are frozen until the
>>>> check point is completed.
>>>>
>>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>>> appears to be 0.1 (10%), via this entry
>>>> in GridCacheDatabaseSharedManager.java:
>>>>
>>>> /**
>>>>  * Threshold to calculate limit for pages list on-heap caches.
>>>>  * 
>>>>
>>>>  * Note: When a checkpoint is triggered, we need some amount of page 
>>>> memory to store pages list on-heap cache.
>>>>
>>>>  * If a checkpoint is triggered by "too many dirty pages" reason and 
>>>> pages list cache is rather big, we can get
>>>>
>>>> * {@code IgniteOutOfMemoryException}. T

Re: Re[4]: Questions related to check pointing

2021-01-06 Thread Raymond Wilson

I checked our code that creates the primary data region, and it does set
the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
that region.

The secondary data region is much smaller, and is set to min/max = 128 Mb
of memory.

The checkpoints with the "too many dirty pages" reason were quoting less
than 100,000 dirty pages, so this must have been triggered on the size of
the smaller data region.

Both these data regions have persistence, and I think this may have been a
sub-optimal way to set it up. My aim was to provide a dedicated channel for
inbound data arriving to be queued that was not impacted by updates due to
processing of that data. I think it may be better to will change this
arrangement to use a single data region to make the checkpointing process
simpler and reduce cases where it decides there are too many dirty pages.

On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I guess it's pool.pages() * 3L / 4
> Since, counter intuitively, the default ThrottlingPolicy is not
> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
>
> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson :
>
>> Regards this section of code:
>>
>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>> ? pool.pages() * 3L / 4
>> : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>
>> I think the correct ratio will be 2/3 of pages as we do not have a
>> throttling policy defined, correct?.
>>
>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky 
>> wrote:
>>
>>> Correct code is running from here:
>>>
>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
>>> safeToUpdatePageMemories() || checkpointer.runner() == null)
>>> break;else {
>>> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many 
>>> dirty pages");
>>>
>>> and near you can see that :
>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED? 
>>> pool.pages() * 3L / 4: Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>> cp.
>>>
>>>
>>> In (
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>> check points.
>>>
>>> I also found this issue:
>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>
>>> After reviewing our logs I found this: (one example)
>>>
>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>>> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>> pages']
>>>
>>> Which suggests we may have the issue where writes are frozen until the
>>> check point is completed.
>>>
>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>> appears to be 0.1 (10%), via this entry
>>> in GridCacheDatabaseSharedManager.java:
>>>
>>> /**
>>>  * Threshold to calculate limit for pages list on-heap caches.
>>>  * 
>>>
>>>  * Note: When a checkpoint is triggered, we need some amount of page 
>>> memory to store pages list on-heap cache.
>>>
>>>  * If a checkpoint is triggered by "too many dirty pages" reason and 
>>> pages list cache is rather big, we can get
>>>
>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the 
>>> total amount of cached page list buckets,
>>>
>>>  * assuming that checkpoint will be triggered if no more then 3/4 of 
>>> pages will be marked as dirty (there will be
>>>
>>>  * at least 1/4 of clean pages) and each cached page list bucket can be 
>>> stored to up to 2 pages (this value is not
>>>
>>>  * static, but depends on PagesCache.MAX_SIZE, so if 
>>> PagesCache.MAX_SIZE > PagesListNodeIO#getC

Re: Re[4]: Questions related to check pointing

2021-01-04 Thread Ilya Kasnacheev

Hello!

I guess it's pool.pages() * 3L / 4
Since, counter intuitively, the default ThrottlingPolicy is not
ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.

Regards,

-- 
Ilya Kasnacheev


чт, 31 дек. 2020 г. в 04:33, Raymond Wilson :

> Regards this section of code:
>
> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
> ? pool.pages() * 3L / 4
> : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>
> I think the correct ratio will be 2/3 of pages as we do not have a
> throttling policy defined, correct?.
>
> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky 
> wrote:
>
>> Correct code is running from here:
>>
>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
>> safeToUpdatePageMemories() || checkpointer.runner() == null)
>> break;else {
>> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many 
>> dirty pages");
>>
>> and near you can see that :
>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED? pool.pages() 
>> * 3L / 4: Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>
>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>> cp.
>>
>>
>> In (
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>> there is a mention of a dirty pages limit that is a factor that can trigger
>> check points.
>>
>> I also found this issue:
>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>
>> After reviewing our logs I found this: (one example)
>>
>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>> pages']
>>
>> Which suggests we may have the issue where writes are frozen until the
>> check point is completed.
>>
>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>> appears to be 0.1 (10%), via this entry
>> in GridCacheDatabaseSharedManager.java:
>>
>> /**
>>  * Threshold to calculate limit for pages list on-heap caches.
>>  * 
>>
>>  * Note: When a checkpoint is triggered, we need some amount of page 
>> memory to store pages list on-heap cache.
>>
>>  * If a checkpoint is triggered by "too many dirty pages" reason and 
>> pages list cache is rather big, we can get
>>
>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the 
>> total amount of cached page list buckets,
>>
>>  * assuming that checkpoint will be triggered if no more then 3/4 of 
>> pages will be marked as dirty (there will be
>>
>>  * at least 1/4 of clean pages) and each cached page list bucket can be 
>> stored to up to 2 pages (this value is not
>>
>>  * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
>> > PagesListNodeIO#getCapacity it can take
>>
>>  * more than 2 pages). Also some amount of page memory needed to store 
>> page list metadata.
>>  */
>> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>
>> This raises two questions:
>>
>> 1. The data region where most writes are occurring has 4Gb allocated to
>> it, though it is permitted to start at a much lower level. 4Gb should be
>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>
>> The 'limit holder' is calculated like this:
>>
>> /**
>>  * @return Holder for page list cache limit for given data region.
>>  */
>> public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>> if (dataRegion.config().isPersistenceEnabled()) {
>> return pageListCacheLimits.computeIfAbsent(dataRegion.config
>> ().getName(), name -> new AtomicLong(
>> (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
>> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>> }
>>
>> return null;
>> }
>>
>> ... but I am unsure if totalPages() is referring to the current size of
>

Re: Re[4]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson

Regards this section of code:

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
? pool.pages() * 3L / 4
: Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a
throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky 
wrote:

> Correct code is running from here:
>
> if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
> safeToUpdatePageMemories() || checkpointer.runner() == null)
> break;else {
> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many 
> dirty pages");
>
> and near you can see that :
> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED? pool.pages() 
> * 3L / 4: Math.min(pool.pages() * 2L / 3, cpPoolPages);
>
> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
>
>
> In (
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
> there is a mention of a dirty pages limit that is a factor that can trigger
> check points.
>
> I also found this issue:
> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
>
> After reviewing our logs I found this: (one example)
>
> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
> pages']
>
> Which suggests we may have the issue where writes are frozen until the
> check point is completed.
>
> Looking at the AI 2.8.1 source code, the dirty page limit fraction appears
> to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>
> /**
>  * Threshold to calculate limit for pages list on-heap caches.
>  * 
>
>  * Note: When a checkpoint is triggered, we need some amount of page 
> memory to store pages list on-heap cache.
>
>  * If a checkpoint is triggered by "too many dirty pages" reason and 
> pages list cache is rather big, we can get
>
> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
> amount of cached page list buckets,
>
>  * assuming that checkpoint will be triggered if no more then 3/4 of 
> pages will be marked as dirty (there will be
>
>  * at least 1/4 of clean pages) and each cached page list bucket can be 
> stored to up to 2 pages (this value is not
>
>  * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
> > PagesListNodeIO#getCapacity it can take
>
>  * more than 2 pages). Also some amount of page memory needed to store 
> page list metadata.
>  */
> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>
> This raises two questions:
>
> 1. The data region where most writes are occurring has 4Gb allocated to
> it, though it is permitted to start at a much lower level. 4Gb should be
> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>
> The 'limit holder' is calculated like this:
>
> /**
>  * @return Holder for page list cache limit for given data region.
>  */
> public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
> if (dataRegion.config().isPersistenceEnabled()) {
> return pageListCacheLimits.computeIfAbsent(dataRegion.config
> ().getName(), name -> new AtomicLong(
> (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
> }
>
> return null;
> }
>
> ... but I am unsure if totalPages() is referring to the current size of
> the data region, or the size it is permitted to grow to. ie: Could the
> 'dirty page limit' be a sliding limit based on the growth of the data
> region? Is it better to set the initial and maximum sizes of data regions
> to be the same number?
>
> 2. We have two data regions, one supporting inbound arrival of data (with
> low numbers of writes), and one supporting storage of processed results
> from the arriving data (with many more writes).
>
> The block on writes due to the number of dirty pages appears to affect all
> data regions, not just the one which has violated the dirty page limit. Is
> that correct? If so, is this something that can be impr

Re: Re[4]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson

Hi Zhenya,

The matching checkpoint finished log is this:

2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint
finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421,
markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms,
pagesWrite=1150ms, fsync=37104ms, total=38571ms]

Regards your comment that 3/4 of pages in whole data region need to be
dirty to trigger this, can you confirm this is 3/4 of the maximum size of
the data region, or of the currently used size (eg: if Min is 1Gb, and Max
is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)

Are data regions independently checkpointed, or are they checkpointed as a
whole, so that a 'too many dirty pages' condition affects all data regions
in terms of write blocking?

Can you comment on my query regarding should we set Min and Max size of the
data region to be the same? Ie: Don't bother with growing the data region
memory use on demand, just allocate the maximum?

In terms of the checkpoint lock hold time metric, of the checkpoints
quoting 'too many dirty pages' there is one instance apart from the one I
have provided earlier violating this limit, ie:

2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint
started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66,
startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573],
checkpointBeforeLockTime=276ms, checkpointLockWait=0ms,
checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms,
walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms,
splitAndSortCpPagesDuration=276ms, pages=4, reason='too many dirty pages
']

This is out of a population of 16 instances I can find. The remainder have
lock times of 16-17ms.

Regarding writes of pages to the persistent store, does the check pointing
system parallelise writes across partitions ro maximise throughput?

Thanks,
Raymond.



On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky 
wrote:

>
> All write operations will be blocked for this timeout : 
> checkpointLockHoldTime=32ms
> (Write Lock holding) If you observe huge amount of such messages :
> reason='too many dirty pages' may be you need to store some data in not
> persisted regions for example or reduce indexes (if you use them). And
> please attach other part of cp message starting with : Checkpoint finished.
>
>
>
>
> In (
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
> there is a mention of a dirty pages limit that is a factor that can trigger
> check points.
>
> I also found this issue:
> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
>
> After reviewing our logs I found this: (one example)
>
> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
> pages']
>
> Which suggests we may have the issue where writes are frozen until the
> check point is completed.
>
> Looking at the AI 2.8.1 source code, the dirty page limit fraction appears
> to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>
> /**
>  * Threshold to calculate limit for pages list on-heap caches.
>  * 
>
>  * Note: When a checkpoint is triggered, we need some amount of page 
> memory to store pages list on-heap cache.
>
>  * If a checkpoint is triggered by "too many dirty pages" reason and 
> pages list cache is rather big, we can get
>
> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
> amount of cached page list buckets,
>
>  * assuming that checkpoint will be triggered if no more then 3/4 of 
> pages will be marked as dirty (there will be
>
>  * at least 1/4 of clean pages) and each cached page list bucket can be 
> stored to up to 2 pages (this value is not
>
>  * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
> > PagesListNodeIO#getCapacity it can take
>
>  * more than 2 pages). Also some amount of page memory needed to store 
> page list metadata.
>  */
> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>
> This raises two questions:
>
> 1. The data region where most writes are occurring has 4Gb allocated to
> it, though it is permitted to sta

Re[4]: Questions related to check pointing

2020-12-30 Thread Zhenya Stanilovsky



All write operations will be blocked for this timeout :  
checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of 
such messages :    reason=' too many dirty pages ' may be you need to store 
some data in not persisted regions for example or reduce indexes (if you use 
them). And please attach other part of cp message starting with : Checkpoint 
finished.


 
>In ( 
>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
> ), there is a mention of a dirty pages limit that is a factor that can 
>trigger check points.
> 
>I also found this issue:  
>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
> 
>After reviewing our logs I found this: (one example)
> 
>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages 
>']   
> 
>Which suggests we may have the issue where writes are frozen until the check 
>point is completed.
> 
>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to 
>be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
> 
>/**
> * Threshold to calculate limit for pages list on-heap caches.
> * 
> * Note: When a checkpoint is triggered, we need some amount of page 
>memory to store pages list on-heap cache.
> * If a checkpoint is triggered by "too many dirty pages" reason and pages 
>list cache is rather big, we can get
>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
>amount of cached page list buckets,
> * assuming that checkpoint will be triggered if no more then 3/4 of pages 
>will be marked as dirty (there will be
> * at least 1/4 of clean pages) and each cached page list bucket can be 
>stored to up to 2 pages (this value is not
> * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > 
>PagesListNodeIO#getCapacity it can take
> * more than 2 pages). Also some amount of page memory needed to store 
>page list metadata.
> */
> private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  
>0.1 ;
> 
>This raises two questions: 
> 
>1. The data region where most writes are occurring has 4Gb allocated to it, 
>though it is permitted to start at a much lower level. 4Gb should be 1,000,000 
>pages, 10% of which should be 100,000 dirty pages.
> 
>The 'limit holder' is calculated like this:
> 
>/**
> *  @return  Holder for page list cache limit for given data region.
> */
> public   AtomicLong   pageListCacheLimitHolder ( DataRegion   dataRegion 
>) {
> if  ( dataRegion . config (). isPersistenceEnabled ()) {
> return   pageListCacheLimits . computeIfAbsent ( dataRegion . 
>config (). getName (), name  ->   new   AtomicLong (
>( long )(((PageMemoryEx) dataRegion . pageMemory ()). 
>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>}  
> return   null ;
>}
> 
>... but I am unsure if totalPages() is referring to the current size of the 
>data region, or the size it is permitted to grow to. ie: Could the 'dirty page 
>limit' be a sliding limit based on the growth of the data region? Is it better 
>to set the initial and maximum sizes of data regions to be the same number?
> 
>2. We have two data regions, one supporting inbound arrival of data (with low 
>numbers of writes), and one supporting storage of processed results from the 
>arriving data (with many more writes). 
> 
>The block on writes due to the number of dirty pages appears to affect all 
>data regions, not just the one which has violated the dirty page limit. Is 
>that correct? If so, is this something that can be improved?
> 
>Thanks,
>Raymond.
>   
>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wil...@trimble.com > 
>wrote:
>>I'm working on getting automatic JVM thread stack dumping occurring if we 
>>detect long delays in put (PutIfAbsent) operations. Hopefully this will 
>>provide more information.  
>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>wrote:
>>>
>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>Need

Re[4]: Questions related to check pointing

2020-12-30 Thread Zhenya Stanilovsky


Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
safeToUpdatePageMemories() || checkpointer.runner() == null)
break;
else {
CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many 
dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
? pool.pages() * 3L / 4
: Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 
>In ( 
>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
> ), there is a mention of a dirty pages limit that is a factor that can 
>trigger check points.
> 
>I also found this issue:  
>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
> 
>After reviewing our logs I found this: (one example)
> 
>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages 
>']   
> 
>Which suggests we may have the issue where writes are frozen until the check 
>point is completed.
> 
>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to 
>be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
> 
>/**
> * Threshold to calculate limit for pages list on-heap caches.
> * 
> * Note: When a checkpoint is triggered, we need some amount of page 
>memory to store pages list on-heap cache.
> * If a checkpoint is triggered by "too many dirty pages" reason and pages 
>list cache is rather big, we can get
>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
>amount of cached page list buckets,
> * assuming that checkpoint will be triggered if no more then 3/4 of pages 
>will be marked as dirty (there will be
> * at least 1/4 of clean pages) and each cached page list bucket can be 
>stored to up to 2 pages (this value is not
> * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > 
>PagesListNodeIO#getCapacity it can take
> * more than 2 pages). Also some amount of page memory needed to store 
>page list metadata.
> */
> private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  
>0.1 ;
> 
>This raises two questions: 
> 
>1. The data region where most writes are occurring has 4Gb allocated to it, 
>though it is permitted to start at a much lower level. 4Gb should be 1,000,000 
>pages, 10% of which should be 100,000 dirty pages.
> 
>The 'limit holder' is calculated like this:
> 
>/**
> *  @return  Holder for page list cache limit for given data region.
> */
> public   AtomicLong   pageListCacheLimitHolder ( DataRegion   dataRegion 
>) {
> if  ( dataRegion . config (). isPersistenceEnabled ()) {
> return   pageListCacheLimits . computeIfAbsent ( dataRegion . 
>config (). getName (), name  ->   new   AtomicLong (
>( long )(((PageMemoryEx) dataRegion . pageMemory ()). 
>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>}  
> return   null ;
>}
> 
>... but I am unsure if totalPages() is referring to the current size of the 
>data region, or the size it is permitted to grow to. ie: Could the 'dirty page 
>limit' be a sliding limit based on the growth of the data region? Is it better 
>to set the initial and maximum sizes of data regions to be the same number?
> 
>2. We have two data regions, one supporting inbound arrival of data (with low 
>numbers of writes), and one supporting storage of processed results from the 
>arriving data (with many more writes). 
> 
>The block on writes due to the number of dirty pages appears to affect all 
>data regions, not just the one which has violated the dirty page limit. Is 
>that correct? If so, is this something that can be improved?
> 
>Thanks,
>Raymond.
>   
>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wil...@trimble.com > 
>wrote:
>>I'm working on getting automatic JVM thread stack dumping occurring if we 
>>detect long delays in put (PutIfAbsent) operations. Hopefully this will 
>>provide more information.  
>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas...@mail.r

Re: Re[2]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson

In (
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
there is a mention of a dirty pages limit that is a factor that can trigger
check points.

I also found this issue:
http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
where "too many dirty pages" is a reason given for initiating a checkpoint.

After reviewing our logs I found this: (one example)

2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']

Which suggests we may have the issue where writes are frozen until the
check point is completed.

Looking at the AI 2.8.1 source code, the dirty page limit fraction appears
to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:

/**
 * Threshold to calculate limit for pages list on-heap caches.
 * 
 * Note: When a checkpoint is triggered, we need some amount of
page memory to store pages list on-heap cache.
 * If a checkpoint is triggered by "too many dirty pages" reason
and pages list cache is rather big, we can get
 * {@code IgniteOutOfMemoryException}. To prevent this, we can
limit the total amount of cached page list buckets,
 * assuming that checkpoint will be triggered if no more then 3/4
of pages will be marked as dirty (there will be
 * at least 1/4 of clean pages) and each cached page list bucket
can be stored to up to 2 pages (this value is not
 * static, but depends on PagesCache.MAX_SIZE, so if
PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
 * more than 2 pages). Also some amount of page memory needed to
store page list metadata.
 */
private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;

This raises two questions:

1. The data region where most writes are occurring has 4Gb allocated to it,
though it is permitted to start at a much lower level. 4Gb should be
1,000,000 pages, 10% of which should be 100,000 dirty pages.

The 'limit holder' is calculated like this:

/**
 * @return Holder for page list cache limit for given data region.
 */
public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
if (dataRegion.config().isPersistenceEnabled()) {
return pageListCacheLimits.computeIfAbsent(dataRegion.config().
getName(), name -> new AtomicLong(
(long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
}

return null;
}

... but I am unsure if totalPages() is referring to the current size of the
data region, or the size it is permitted to grow to. ie: Could the 'dirty
page limit' be a sliding limit based on the growth of the data region? Is
it better to set the initial and maximum sizes of data regions to be the
same number?

2. We have two data regions, one supporting inbound arrival of data (with
low numbers of writes), and one supporting storage of processed results
from the arriving data (with many more writes).

The block on writes due to the number of dirty pages appears to affect all
data regions, not just the one which has violated the dirty page limit. Is
that correct? If so, is this something that can be improved?

Thanks,
Raymond.

On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson 
wrote:

> I'm working on getting automatic JVM thread stack dumping occurring if we
> detect long delays in put (PutIfAbsent) operations. Hopefully this will
> provide more information.
>
> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky 
> wrote:
>
>>
>> Don`t think so, checkpointing work perfectly well already before this fix.
>> Need additional info for start digging your problem, can you share ignite
>> logs somewhere?
>>
>>
>> I noticed an entry in the Ignite 2.9.1 changelog:
>>
>>- Improved checkpoint concurrent behaviour
>>
>> I am having trouble finding the relevant Jira ticket for this in the
>> 2.9.1 Jira area at
>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>
>> Perhaps this change may improve the checkpointing issue we are seeing?
>>
>> Raymond.
>>
>>
>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <
>> raymond_wil...@trimble.com
>> > wrote:
>>
>> Hi Zhenya,
>>
>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
>> provide sufficient IO. Our Ignite cluster currently to

Re: Re[2]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson

I'm working on getting automatic JVM thread stack dumping occurring if we
detect long delays in put (PutIfAbsent) operations. Hopefully this will
provide more information.

On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky 
wrote:

>
> Don`t think so, checkpointing work perfectly well already before this fix.
> Need additional info for start digging your problem, can you share ignite
> logs somewhere?
>
>
> I noticed an entry in the Ignite 2.9.1 changelog:
>
>- Improved checkpoint concurrent behaviour
>
> I am having trouble finding the relevant Jira ticket for this in the 2.9.1
> Jira area at
> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>
> Perhaps this change may improve the checkpointing issue we are seeing?
>
> Raymond.
>
>
> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson  > wrote:
>
> Hi Zhenya,
>
> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
> are not saturating the EFS interface. We use the default page size
> (experiments with larger page sizes showed instability when checkpointing
> due to free page starvation, so we reverted to the default size).
>
> 2. Thanks for the detail, we will look for that in thread dumps when we
> can create them.
>
> 3. We are using the default CP buffer size, which is max(256Mb,
> DataRagionSize / 4) according to the Ignite documentation, so this should
> have more than enough checkpoint buffer space to cope with writes. As
> additional information, the cache which is displaying very slow writes is
> in a data region with relatively slow write traffic. There is a primary
> (default) data region with large write traffic, and the vast majority of
> pages being written in a checkpoint will be for that default data region.
>
> 4. Yes, this is very surprising. Anecdotally from our logs it appears
> write traffic into the low write traffic cache is blocked during
> checkpoints.
>
> Thanks,
> Raymond.
>
>
>
> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky  > wrote:
>
>
>1. Additionally to Ilya reply you can check vendors page for
>additional info, all in this page are applicable for ignite too [1].
>Increasing threads number leads to concurrent io usage, thus if your have
>something like nvme — it`s up to you but in case of sas possibly better
>would be to reduce this param.
>2. Log will shows you something like :
>
>Parking thread=%Thread name% for timeout(ms)= %time%
>
>and appropriate :
>
>Unparking thread=
>
>3. No additional looging with cp buffer usage are provided. cp buffer
>need to be more than 10% of overall persistent  DataRegions size.
>4. 90 seconds or longer —  Seems like problems in io or system tuning,
>it`s very bad score i hope.
>
> [1]
> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>
>
>
>
>
> Hi,
>
> We have been investigating some issues which appear to be related to
> checkpointing. We currently use the IA 2.8.1 with the C# client.
>
> I have been trying to gain clarity on how certain aspects of the Ignite
> configuration relate to the checkpointing process:
>
> 1. Number of check pointing threads. This defaults to 4, but I don't
> understand how it applies to the checkpointing process. Are more threads
> generally better (eg: because it makes the disk IO parallel across the
> threads), or does it only have a positive effect if you have many data
> storage regions? Or something else? If this could be clarified in the
> documentation (or a pointer to it which Google has not yet found), that
> would be good.
>
> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
> that reducing this time would result in smaller less disruptive check
> points. Setting it to 60 seconds seems pretty safe, but is there a
> practical lower limit that should be used for use cases with new data
> constantly being added, eg: 5 seconds, 10 seconds?
>
> 3. Write exclusivity constraints during checkpointing. I understand that
> while a checkpoint is occurring ongoing writes will be supported into the
> caches being check pointed, and if those are writes to existing pages then
> those will be duplicated into the checkpoint buffer. If this buffer becomes
> full or stressed then Ignite will throttle, and perhaps block, writes until
> the checkpoint is complete. If this is the case then Ignite will emit
> logging (warning or informational?) that writes are being throttled.
>
> We have cases where simple puts to caches (a few requests per second) are
> taking up to 90 seconds to execute when there is an active check point
> occurring, where the check point has been triggered by the checkpoint
> timer. When a checkpoint is not occurring the time to do this is usually in
> the milliseconds.

Re[2]: Questions related to check pointing

2020-12-29 Thread Zhenya Stanilovsky

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs 
somewhere?

>I noticed an entry in the Ignite 2.9.1 changelog:
>*  Improved checkpoint concurrent behaviour
>I am having trouble finding the relevant Jira ticket for this in the 2.9.1 
>Jira area at  
>https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
> 
>Perhaps this change may improve the checkpointing issue we are seeing?
> 
>Raymond.
>   
>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < raymond_wil...@trimble.com > 
>wrote:
>>Hi Zhenya,
>> 
>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to 
>>provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage 
>>(with at least 5 nodes writing to it, including WAL and WAL archive), so we 
>>are not saturating the EFS interface. We use the default page size 
>>(experiments with larger page sizes showed instability when checkpointing due 
>>to free page starvation, so we reverted to the default size). 
>> 
>>2. Thanks for the detail, we will look for that in thread dumps when we can 
>>create them.
>> 
>>3. We are using the default CP buffer size, which is max(256Mb, 
>>DataRagionSize / 4) according to the Ignite documentation, so this should 
>>have more than enough checkpoint buffer space to cope with writes. As 
>>additional information, the cache which is displaying very slow writes is in 
>>a data region with relatively slow write traffic. There is a primary 
>>(default) data region with large write traffic, and the vast majority of 
>>pages being written in a checkpoint will be for that default data region.
>> 
>>4. Yes, this is very surprising. Anecdotally from our logs it appears write 
>>traffic into the low write traffic cache is blocked during checkpoints.
>> 
>>Thanks,
>>Raymond.
>>    
>>   
>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>wrote:
>>>*  
>>>Additionally to Ilya reply you can check vendors page for additional info, 
>>>all in this page are applicable for ignite too [1]. Increasing threads 
>>>number leads to concurrent io usage, thus if your have something like nvme — 
>>>it`s up to you but in case of sas possibly better would be to reduce this 
>>>param.
>>>*  Log will shows you something like :
>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>Unparking thread=
>>>*  No additional looging with cp buffer usage are provided. cp buffer need 
>>>to be more than 10% of overall persistent  DataRegions size.
>>>*  90 seconds or longer  —    Seems like problems in io or system tuning, 
>>>it`s very bad score i hope. 
>>>[1]  
>>>https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>
>>>
>>> 
Hi,

We have been investigating some issues which appear to be related to 
checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite 
configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't 
understand how it applies to the checkpointing process. Are more threads 
generally better (eg: because it makes the disk IO parallel across the 
threads), or does it only have a positive effect if you have many data 
storage regions? Or something else? If this could be clarified in the 
documentation (or a pointer to it which Google has not yet found), that 
would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking 
that reducing this time would result in smaller less disruptive check 
points. Setting it to 60 seconds seems pretty safe, but is there a 
practical lower limit that should be used for use cases with new data 
constantly being added, eg: 5 seconds, 10 seconds?

3. Write exclusivity constraints during checkpointing. I understand that 
while a checkpoint is occurring ongoing writes will be supported into the 
caches being check pointed, and if those are writes to existing pages then 
those will be duplicated into the checkpoint buffer. If this buffer becomes 
full or stressed then Ignite will throttle, and perhaps block, writes until 
the checkpoint is complete. If this is the case then Ignite will emit 
logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are 
taking up to 90 seconds to execute when there is an active check point 
occurring, where the check point has been triggered by the checkpoint 
timer. When a checkpoint is not occurring the time to do this is usually in 
the milliseconds. The checkpoints themselves can take 90 seconds or longer, 
and are updating up to

Re: Questions related to check pointing

2020-12-29 Thread Raymond Wilson

I noticed an entry in the Ignite 2.9.1 changelog:

   - Improved checkpoint concurrent behaviour

I am having trouble finding the relevant Jira ticket for this in the 2.9.1
Jira area at
https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved

Perhaps this change may improve the checkpointing issue we are seeing?

Raymond.


On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson 
wrote:

> Hi Zhenya,
>
> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
> are not saturating the EFS interface. We use the default page size
> (experiments with larger page sizes showed instability when checkpointing
> due to free page starvation, so we reverted to the default size).
>
> 2. Thanks for the detail, we will look for that in thread dumps when we
> can create them.
>
> 3. We are using the default CP buffer size, which is max(256Mb,
> DataRagionSize / 4) according to the Ignite documentation, so this should
> have more than enough checkpoint buffer space to cope with writes. As
> additional information, the cache which is displaying very slow writes is
> in a data region with relatively slow write traffic. There is a primary
> (default) data region with large write traffic, and the vast majority of
> pages being written in a checkpoint will be for that default data region.
>
> 4. Yes, this is very surprising. Anecdotally from our logs it appears
> write traffic into the low write traffic cache is blocked during
> checkpoints.
>
> Thanks,
> Raymond.
>
>
>
> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky 
> wrote:
>
>>
>>1. Additionally to Ilya reply you can check vendors page for
>>additional info, all in this page are applicable for ignite too [1].
>>Increasing threads number leads to concurrent io usage, thus if your have
>>something like nvme — it`s up to you but in case of sas possibly better
>>would be to reduce this param.
>>2. Log will shows you something like :
>>
>>Parking thread=%Thread name% for timeout(ms)= %time%
>>
>>and appropriate :
>>
>>Unparking thread=
>>
>>3. No additional looging with cp buffer usage are provided. cp buffer
>>need to be more than 10% of overall persistent  DataRegions size.
>>4. 90 seconds or longer —  Seems like problems in io or system
>>tuning, it`s very bad score i hope.
>>
>> [1]
>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>
>>
>>
>>
>>
>> Hi,
>>
>> We have been investigating some issues which appear to be related to
>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>
>> I have been trying to gain clarity on how certain aspects of the Ignite
>> configuration relate to the checkpointing process:
>>
>> 1. Number of check pointing threads. This defaults to 4, but I don't
>> understand how it applies to the checkpointing process. Are more threads
>> generally better (eg: because it makes the disk IO parallel across the
>> threads), or does it only have a positive effect if you have many data
>> storage regions? Or something else? If this could be clarified in the
>> documentation (or a pointer to it which Google has not yet found), that
>> would be good.
>>
>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
>> that reducing this time would result in smaller less disruptive check
>> points. Setting it to 60 seconds seems pretty safe, but is there a
>> practical lower limit that should be used for use cases with new data
>> constantly being added, eg: 5 seconds, 10 seconds?
>>
>> 3. Write exclusivity constraints during checkpointing. I understand that
>> while a checkpoint is occurring ongoing writes will be supported into the
>> caches being check pointed, and if those are writes to existing pages then
>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>> full or stressed then Ignite will throttle, and perhaps block, writes until
>> the checkpoint is complete. If this is the case then Ignite will emit
>> logging (warning or informational?) that writes are being throttled.
>>
>> We have cases where simple puts to caches (a few requests per second) are
>> taking up to 90 seconds to execute when there is an active check point
>> occurring, where the check point has been triggered by the checkpoint
>> timer. When a checkpoint is not occurring the time to do this is usually in
>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>> pages at the standard 4kb page size), and one small region with 128Mb.
>> There is no 'throttling' logging being emitted that we can tell, so the
>>

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson

Hi Zhenya,

1. We currently use AWS EFS for primary storage, with provisioned IOPS to
provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
(with at least 5 nodes writing to it, including WAL and WAL archive), so we
are not saturating the EFS interface. We use the default page size
(experiments with larger page sizes showed instability when checkpointing
due to free page starvation, so we reverted to the default size).

2. Thanks for the detail, we will look for that in thread dumps when we can
create them.

3. We are using the default CP buffer size, which is max(256Mb,
DataRagionSize / 4) according to the Ignite documentation, so this should
have more than enough checkpoint buffer space to cope with writes. As
additional information, the cache which is displaying very slow writes is
in a data region with relatively slow write traffic. There is a primary
(default) data region with large write traffic, and the vast majority of
pages being written in a checkpoint will be for that default data region.

4. Yes, this is very surprising. Anecdotally from our logs it appears write
traffic into the low write traffic cache is blocked during checkpoints.

Thanks,
Raymond.



On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky 
wrote:

>
>1. Additionally to Ilya reply you can check vendors page for
>additional info, all in this page are applicable for ignite too [1].
>Increasing threads number leads to concurrent io usage, thus if your have
>something like nvme — it`s up to you but in case of sas possibly better
>would be to reduce this param.
>2. Log will shows you something like :
>
>Parking thread=%Thread name% for timeout(ms)= %time%
>
>and appropriate :
>
>Unparking thread=
>
>3. No additional looging with cp buffer usage are provided. cp buffer
>need to be more than 10% of overall persistent  DataRegions size.
>4. 90 seconds or longer —  Seems like problems in io or system tuning,
>it`s very bad score i hope.
>
> [1]
> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>
>
>
>
>
> Hi,
>
> We have been investigating some issues which appear to be related to
> checkpointing. We currently use the IA 2.8.1 with the C# client.
>
> I have been trying to gain clarity on how certain aspects of the Ignite
> configuration relate to the checkpointing process:
>
> 1. Number of check pointing threads. This defaults to 4, but I don't
> understand how it applies to the checkpointing process. Are more threads
> generally better (eg: because it makes the disk IO parallel across the
> threads), or does it only have a positive effect if you have many data
> storage regions? Or something else? If this could be clarified in the
> documentation (or a pointer to it which Google has not yet found), that
> would be good.
>
> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
> that reducing this time would result in smaller less disruptive check
> points. Setting it to 60 seconds seems pretty safe, but is there a
> practical lower limit that should be used for use cases with new data
> constantly being added, eg: 5 seconds, 10 seconds?
>
> 3. Write exclusivity constraints during checkpointing. I understand that
> while a checkpoint is occurring ongoing writes will be supported into the
> caches being check pointed, and if those are writes to existing pages then
> those will be duplicated into the checkpoint buffer. If this buffer becomes
> full or stressed then Ignite will throttle, and perhaps block, writes until
> the checkpoint is complete. If this is the case then Ignite will emit
> logging (warning or informational?) that writes are being throttled.
>
> We have cases where simple puts to caches (a few requests per second) are
> taking up to 90 seconds to execute when there is an active check point
> occurring, where the check point has been triggered by the checkpoint
> timer. When a checkpoint is not occurring the time to do this is usually in
> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
> and are updating up to 30,000-40,000 pages, across a pair of data storage
> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
> pages at the standard 4kb page size), and one small region with 128Mb.
> There is no 'throttling' logging being emitted that we can tell, so the
> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
> for the second smaller region in this case) does not look like it can fill
> up during the checkpoint.
>
> It seems like the checkpoint is affecting the put operations, but I don't
> understand why that may be given the documented checkpointing process, and
> the checkpoint itself (at least via Informational logging) is not
> advertising any restrictions.
>
> Thanks,
> Raymond.
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
>
>
>
>
>
>
>


--

Re: Questions related to check pointing

2020-12-28 Thread Zhenya Stanilovsky


*  Additionally to Ilya reply you can check vendors page for additional info, 
all in this page are applicable for ignite too [1]. Increasing threads number 
leads to concurrent io usage, thus if your have something like nvme — it`s up 
to you but in case of sas possibly better would be to reduce this param.
*  Log will shows you something like :
Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
Unparking thread=
*  No additional looging with cp buffer usage are provided. cp buffer need to 
be more than 10% of overall persistent  DataRegions size.
*  90 seconds or longer  —    Seems like problems in io or system tuning, it`s 
very bad score i hope. 
[1] 
https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning


 
>Hi,
> 
>We have been investigating some issues which appear to be related to 
>checkpointing. We currently use the IA 2.8.1 with the C# client.
> 
>I have been trying to gain clarity on how certain aspects of the Ignite 
>configuration relate to the checkpointing process:
> 
>1. Number of check pointing threads. This defaults to 4, but I don't 
>understand how it applies to the checkpointing process. Are more threads 
>generally better (eg: because it makes the disk IO parallel across the 
>threads), or does it only have a positive effect if you have many data storage 
>regions? Or something else? If this could be clarified in the documentation 
>(or a pointer to it which Google has not yet found), that would be good.
> 
>2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that 
>reducing this time would result in smaller less disruptive check points. 
>Setting it to 60 seconds seems pretty safe, but is there a practical lower 
>limit that should be used for use cases with new data constantly being added, 
>eg: 5 seconds, 10 seconds?
> 
>3. Write exclusivity constraints during checkpointing. I understand that while 
>a checkpoint is occurring ongoing writes will be supported into the caches 
>being check pointed, and if those are writes to existing pages then those will 
>be duplicated into the checkpoint buffer. If this buffer becomes full or 
>stressed then Ignite will throttle, and perhaps block, writes until the 
>checkpoint is complete. If this is the case then Ignite will emit logging 
>(warning or informational?) that writes are being throttled.
> 
>We have cases where simple puts to caches (a few requests per second) are 
>taking up to 90 seconds to execute when there is an active check point 
>occurring, where the check point has been triggered by the checkpoint timer. 
>When a checkpoint is not occurring the time to do this is usually in the 
>milliseconds. The checkpoints themselves can take 90 seconds or longer, and 
>are updating up to 30,000-40,000 pages, across a pair of data storage regions, 
>one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the 
>standard 4kb page size), and one small region with 128Mb. There is no 
>'throttling' logging being emitted that we can tell, so the checkpoint buffer 
>(which should be 1Gb for the first data region and 256 Mb for the second 
>smaller region in this case) does not look like it can fill up during the 
>checkpoint.
> 
>It seems like the checkpoint is affecting the put operations, but I don't 
>understand why that may be given the documented checkpointing process, and the 
>checkpoint itself (at least via Informational logging) is not advertising any 
>restrictions.
> 
>Thanks,
>Raymond.
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson

As another detail, we have the WriteThrottlingEnabled property left at its
default value of 'false', so I would not ordinarily expect throttling,
correct?

On Tue, Dec 29, 2020 at 10:04 AM Raymond Wilson 
wrote:

> Hi Ilya,
>
> Regarding the throttling question, I have not yet looked at thread dumps -
> the observed behaviour has been seen in production metrics and logging.
> What would you expect a thread dump to show in this case?
>
> Given my description of the sizes of the data regions and the numbers of
> pages being updated in a checkpoint would you expect any
> throttling behaviour?
>
> Thanks,
> Raymond.
>
> On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> 1. If we knew the specific circumstances in which a specific setting
>> value will yield the most benefit, we would've already set it to that
>> value. A setting means that you may tune it and get better results, or not.
>> But in general we can't promise you anything. I did see improvements from
>> increasing this setting in a very specific setup, but in general you may
>> leave it as is.
>>
>> 2. More frequent checkpoints mean increased write amplification. So
>> reducing this value may overwhelm your system with load that it was able to
>> handle previously. You can set this setting to arbitrary small value,
>> meaning that checkpoints will be purely sequential without any pauses
>> between them.
>>
>> 3. I don't think that default throttling mechanism will emit any
>> warnings. What do you see in thread dumps?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 23 дек. 2020 г. в 12:48, Raymond Wilson :
>>
>>> Hi,
>>>
>>> We have been investigating some issues which appear to be related to
>>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>
>>> I have been trying to gain clarity on how certain aspects of the Ignite
>>> configuration relate to the checkpointing process:
>>>
>>> 1. Number of check pointing threads. This defaults to 4, but I don't
>>> understand how it applies to the checkpointing process. Are more threads
>>> generally better (eg: because it makes the disk IO parallel across the
>>> threads), or does it only have a positive effect if you have many data
>>> storage regions? Or something else? If this could be clarified in the
>>> documentation (or a pointer to it which Google has not yet found), that
>>> would be good.
>>>
>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>> thinking that reducing this time would result in smaller
>>> less disruptive check points. Setting it to 60 seconds seems pretty
>>> safe, but is there a practical lower limit that should be used for use
>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>
>>> 3. Write exclusivity constraints during checkpointing. I understand that
>>> while a checkpoint is occurring ongoing writes will be supported into the
>>> caches being check pointed, and if those are writes to existing pages then
>>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>>> full or stressed then Ignite will throttle, and perhaps block, writes until
>>> the checkpoint is complete. If this is the case then Ignite will emit
>>> logging (warning or informational?) that writes are being throttled.
>>>
>>> We have cases where simple puts to caches (a few requests per second)
>>> are taking up to 90 seconds to execute when there is an active check point
>>> occurring, where the check point has been triggered by the checkpoint
>>> timer. When a checkpoint is not occurring the time to do this is usually in
>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>>> pages at the standard 4kb page size), and one small region with 128Mb.
>>> There is no 'throttling' logging being emitted that we can tell, so the
>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>>> for the second smaller region in this case) does not look like it can fill
>>> up during the checkpoint.
>>>
>>> It seems like the checkpoint is affecting the put operations, but I
>>> don't understand why that may be given the documented checkpointing
>>> process, and the checkpoint itself (at least via Informational logging) is
>>> not advertising any restrictions.
>>>
>>> Thanks,
>>> Raymond.
>>>
>>> --
>>> 
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>
>>>
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
>
>
> 
>


-- 

Raymond Wilson

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson

Hi Ilya,

Regarding the throttling question, I have not yet looked at thread dumps -
the observed behaviour has been seen in production metrics and logging.
What would you expect a thread dump to show in this case?

Given my description of the sizes of the data regions and the numbers of
pages being updated in a checkpoint would you expect any
throttling behaviour?

Thanks,
Raymond.

On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1. If we knew the specific circumstances in which a specific setting value
> will yield the most benefit, we would've already set it to that value. A
> setting means that you may tune it and get better results, or not. But in
> general we can't promise you anything. I did see improvements from
> increasing this setting in a very specific setup, but in general you may
> leave it as is.
>
> 2. More frequent checkpoints mean increased write amplification. So
> reducing this value may overwhelm your system with load that it was able to
> handle previously. You can set this setting to arbitrary small value,
> meaning that checkpoints will be purely sequential without any pauses
> between them.
>
> 3. I don't think that default throttling mechanism will emit any warnings.
> What do you see in thread dumps?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 23 дек. 2020 г. в 12:48, Raymond Wilson :
>
>> Hi,
>>
>> We have been investigating some issues which appear to be related to
>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>
>> I have been trying to gain clarity on how certain aspects of the Ignite
>> configuration relate to the checkpointing process:
>>
>> 1. Number of check pointing threads. This defaults to 4, but I don't
>> understand how it applies to the checkpointing process. Are more threads
>> generally better (eg: because it makes the disk IO parallel across the
>> threads), or does it only have a positive effect if you have many data
>> storage regions? Or something else? If this could be clarified in the
>> documentation (or a pointer to it which Google has not yet found), that
>> would be good.
>>
>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
>> that reducing this time would result in smaller less disruptive check
>> points. Setting it to 60 seconds seems pretty safe, but is there a
>> practical lower limit that should be used for use cases with new data
>> constantly being added, eg: 5 seconds, 10 seconds?
>>
>> 3. Write exclusivity constraints during checkpointing. I understand that
>> while a checkpoint is occurring ongoing writes will be supported into the
>> caches being check pointed, and if those are writes to existing pages then
>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>> full or stressed then Ignite will throttle, and perhaps block, writes until
>> the checkpoint is complete. If this is the case then Ignite will emit
>> logging (warning or informational?) that writes are being throttled.
>>
>> We have cases where simple puts to caches (a few requests per second) are
>> taking up to 90 seconds to execute when there is an active check point
>> occurring, where the check point has been triggered by the checkpoint
>> timer. When a checkpoint is not occurring the time to do this is usually in
>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>> pages at the standard 4kb page size), and one small region with 128Mb.
>> There is no 'throttling' logging being emitted that we can tell, so the
>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>> for the second smaller region in this case) does not look like it can fill
>> up during the checkpoint.
>>
>> It seems like the checkpoint is affecting the put operations, but I don't
>> understand why that may be given the documented checkpointing process, and
>> the checkpoint itself (at least via Informational logging) is not
>> advertising any restrictions.
>>
>> Thanks,
>> Raymond.
>>
>> --
>> 
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>>
>>

-- 

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wil...@trimble.com

Re: Questions related to check pointing

2020-12-28 Thread Ilya Kasnacheev

Hello!

1. If we knew the specific circumstances in which a specific setting value
will yield the most benefit, we would've already set it to that value. A
setting means that you may tune it and get better results, or not. But in
general we can't promise you anything. I did see improvements from
increasing this setting in a very specific setup, but in general you may
leave it as is.

2. More frequent checkpoints mean increased write amplification. So
reducing this value may overwhelm your system with load that it was able to
handle previously. You can set this setting to arbitrary small value,
meaning that checkpoints will be purely sequential without any pauses
between them.

3. I don't think that default throttling mechanism will emit any warnings.
What do you see in thread dumps?

Regards,
-- 
Ilya Kasnacheev


ср, 23 дек. 2020 г. в 12:48, Raymond Wilson :

> Hi,
>
> We have been investigating some issues which appear to be related to
> checkpointing. We currently use the IA 2.8.1 with the C# client.
>
> I have been trying to gain clarity on how certain aspects of the Ignite
> configuration relate to the checkpointing process:
>
> 1. Number of check pointing threads. This defaults to 4, but I don't
> understand how it applies to the checkpointing process. Are more threads
> generally better (eg: because it makes the disk IO parallel across the
> threads), or does it only have a positive effect if you have many data
> storage regions? Or something else? If this could be clarified in the
> documentation (or a pointer to it which Google has not yet found), that
> would be good.
>
> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
> that reducing this time would result in smaller less disruptive check
> points. Setting it to 60 seconds seems pretty safe, but is there a
> practical lower limit that should be used for use cases with new data
> constantly being added, eg: 5 seconds, 10 seconds?
>
> 3. Write exclusivity constraints during checkpointing. I understand that
> while a checkpoint is occurring ongoing writes will be supported into the
> caches being check pointed, and if those are writes to existing pages then
> those will be duplicated into the checkpoint buffer. If this buffer becomes
> full or stressed then Ignite will throttle, and perhaps block, writes until
> the checkpoint is complete. If this is the case then Ignite will emit
> logging (warning or informational?) that writes are being throttled.
>
> We have cases where simple puts to caches (a few requests per second) are
> taking up to 90 seconds to execute when there is an active check point
> occurring, where the check point has been triggered by the checkpoint
> timer. When a checkpoint is not occurring the time to do this is usually in
> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
> and are updating up to 30,000-40,000 pages, across a pair of data storage
> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
> pages at the standard 4kb page size), and one small region with 128Mb.
> There is no 'throttling' logging being emitted that we can tell, so the
> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
> for the second smaller region in this case) does not look like it can fill
> up during the checkpoint.
>
> It seems like the checkpoint is affecting the put operations, but I don't
> understand why that may be given the documented checkpointing process, and
> the checkpoint itself (at least via Informational logging) is not
> advertising any restrictions.
>
> Thanks,
> Raymond.
>
> --
> 
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
>
>

Questions related to check pointing

2020-12-23 Thread Raymond Wilson

Hi,

We have been investigating some issues which appear to be related to
checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite
configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't
understand how it applies to the checkpointing process. Are more threads
generally better (eg: because it makes the disk IO parallel across the
threads), or does it only have a positive effect if you have many data
storage regions? Or something else? If this could be clarified in the
documentation (or a pointer to it which Google has not yet found), that
would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
that reducing this time would result in smaller less disruptive check
points. Setting it to 60 seconds seems pretty safe, but is there a
practical lower limit that should be used for use cases with new data
constantly being added, eg: 5 seconds, 10 seconds?

3. Write exclusivity constraints during checkpointing. I understand that
while a checkpoint is occurring ongoing writes will be supported into the
caches being check pointed, and if those are writes to existing pages then
those will be duplicated into the checkpoint buffer. If this buffer becomes
full or stressed then Ignite will throttle, and perhaps block, writes until
the checkpoint is complete. If this is the case then Ignite will emit
logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are
taking up to 90 seconds to execute when there is an active check point
occurring, where the check point has been triggered by the checkpoint
timer. When a checkpoint is not occurring the time to do this is usually in
the milliseconds. The checkpoints themselves can take 90 seconds or longer,
and are updating up to 30,000-40,000 pages, across a pair of data storage
regions, one with 4Gb in-memory space allocated (which should be 1,000,000
pages at the standard 4kb page size), and one small region with 128Mb.
There is no 'throttling' logging being emitted that we can tell, so the
checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
for the second smaller region in this case) does not look like it can fill
up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't
understand why that may be given the documented checkpointing process, and
the checkpoint itself (at least via Informational logging) is not
advertising any restrictions.

Thanks,
Raymond.

-- 

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)

Re: Ignite ML random forest questions

2020-08-11 Thread akorensh

Hi,
  The model(s) learn a correlation between the label(s) and the features.
  In the Random Forest Classification example the Labeled feature represents
the class that a wine belongs
  to based on a given set of features. 
  see: 

  The labeled feature is defined here:
  Vectorizer vectorizer =
new DummyVectorizer()
.labeled(Vectorizer.LabelCoordinate.FIRST);
ModelsComposition randomForestMdl = classifier.fit(ignite,
dataCache, vectorizer);
   
   After the model has learned the associations between class and labels, it
is tested here:
double groundTruth = val.get(0);

double prediction = randomForestMdl.predict(inputs);

totalAmount++;
if (!Precision.equals(groundTruth, prediction,
Precision.EPSILON))
amountOfErrors++;


  if you put breakpoints on these lines, groundTruth will be one of 3
available classes and the model
 prediction will try match that classification based on available inputs.


see: https://apacheignite.readme.io/docs/random-forest
In that document you will find more references on working with random forest
models.

If you are new to ML, simple Linear Regression might be the most accessible
model to learn.
https://apacheignite.readme.io/docs/ols-multiple-linear-regression

  
Is there a way to parallelize the training across available cores while
still limiting
the operation to a single JVM process?

Apache Ignite machine learning was designed from the bottom up to train a
model quickly by spreading the load across all nodes of a cluster. 

see: https://apacheignite.readme.io/docs/ml-partition-based-dataset

If you want to limit training to a single JVM process then create a cluster
of one node.



Take a look in the examples here on pointers with feature selection:

https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/selection
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/hyperparametertuning

Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Ignite ML random forest questions

2020-08-08 Thread Thilo-Alexander Ginkel

Hello everyone,

I am currently experimenting with Ignite machine learning (random
forest regression / classifier) and have come up with a couple of
questions that I can't seem to answer using docs or sample code. I am
rather new to ML as well as Ignite, so I hope that answers aren't too
obvious. ;-)

Is my assumption correct that the label is the coordinate that is
supposed to be learned (possibly depending on all other features) and
later predicted by the model?

At the moment, I am training my model from a local cache
(CacheMode.LOCAL) that I populate through a CacheStoreAdapter from
ElasticSearch as I can fit all data into RAM of a single node.
Training seems to be single-threaded, though. Is there a way to
parallelize the training across available cores while still limiting
the operation to a single JVM process?

After training a model I'd like to figure out the importance of the
different features. Is there a way to obtain the feature importance
from the model?

Thanks,
Thilo

Re: Questions Regarding Critical Workkers Health Check

2020-06-12 Thread akorensh

Hi,
   Question-1: Does the above log mean that the thread is really blocked? or
is
it just busy doing something else?

   The thread in question might be busy doing something else and did not
update its heartbeat timestamp.
   See:
https://apacheignite.readme.io/docs/critical-failures-handling#critical-workers-health-check
   When that happens a thread dump is generated and the pre-configured
failurehandler is called:


   Question-2: How can I decide the suitable values of these timeouts for my
   case? The former values were working for me earlier but now I face this
   exception.

 Usually the default values are what works best, otherwise use
experimental means to determine optimal settings for your use case.

  Question-3: I can see that this failure type (WORKER_THREAD_BLOCKED) is
  actually ignored by default so why do we still see it as an ERROR in logs?
   When a thread is blocked, the event in question is: WORKER_THREAD_BLOCKED 


  The error message reflects the event type. It is up to the failure handler
to handle that
   event. See:
https://apacheignite.readme.io/docs/critical-failures-handling#failure-handling(see
implementation below)


   Question-4: As a remedy to this, I have thought of adding another timeout
to
  my configuration:
   
   I read that failureDetectionTimeout is ignored in case any other timeout
is
   set. Would that mean now my failure detection timeout would also become
  3? or would it mean that failureDetectionTimeout would still be the
  configured value and just that it's value will be ignored for
  systemWorkerBlockedTimeout (which would now be 3)?

 Setting systemWorkerBlockedTimeout affects the timeout related to that
property, and nothing else.



  Question-5: How to decide the value for systemWorkerBlockedTimeout, do we
  have some guidelines here?
   Leave default or use experimental means.


Question-6: As I can see in
https://issues.apache.org/jira/browse/IGNITE-10154, this
WORKER_THREAD_BLOCKED failure is ignored by default, but on setting some
positive value for systemWorkerBlockedTimeout, it would actually start
working. However, I'm not sure if I want that right now. How else can I
handle this scenario so that I don't get these unnecessary and very frequent
exceptions without enabling this failure?

This was an issue w/an old verison (2.7) and is now resolved.
Setting systemWorkerBlockedTimeout would only affect that property only
Liveness check is enabled either way and failurehandling is the sole domain
of the configured  
Failure Handler.  

Here are some links to the implementation to make it clearer:
All relevant threads get put into a workers registry:
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/worker/WorkersRegistry.java

the registry get started by the kernel: 
https://github.com/apache/ignite/blob/c3a2deb8f464e4547f65164d2ad62b10854cb199/modules/core/src/main/java/org/apache/ignite/internal/IgnitionEx.java#L1805

the lamda uses the FailureProcessor to handle failures using the configured
FailureHandler (default StopNodeOrHaltFaiulreHandler):
https://github.com/apache/ignite/blob/c3a2deb8f464e4547f65164d2ad62b10854cb199/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java#L156


The workers registry continuously monitors all threads here:
Take a look at the error message and how the thread dump is generated.
https://github.com/apache/ignite/blob/c3a2deb8f464e4547f65164d2ad62b10854cb199/modules/core/src/main/java/org/apache/ignite/internal/worker/WorkersRegistry.java#L175

Thanks, Alex













--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Questions Regarding Critical Workkers Health Check

2020-06-12 Thread zork

Hello Ignite Experts,

We have recently upgraded from version 2.6 to 2.8.0 and have started to face
some weird behavior since then. 

With the below configuration:
   



We are seeing the below log (with different thread names) multiple times
every second as soon as the ignite server is started:

06-11 08:40:50,309978 [61] ERROR G(Ignite) - Blocked system-critical thread
has been detected. This can lead to cluster-wide undefined behaviour
[workerName=grid-nio-worker-tcp-comm-3,
threadName=grid-nio-worker-tcp-comm-3-#27%IgniteCluster1%, blockedFor=1s]
06-11 08:40:50,310267 [61] WARN  G(Ignite) - Thread
[name="grid-nio-worker-tcp-comm-3-#27%IgniteCluster1%", id=40,
state=RUNNABLE, blockCnt=0, waitCnt=0]

06-11 08:40:50,310609 [61] WARN  (Ignite) - Possible failure suppressed
accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-3, igniteInstanceName=IgniteCluster1,
finished=false, heartbeatTs=1591864848869]]]
06-11 08:40:50,311975 [61] WARN  CacheDiagnosticManager(Ignite) - Page locks
dump:

Thread=[name=NgServiceProvider_EventsExecutor_0, id=77], state=WAITING
Locked pages = []
Locked pages log: name=NgServiceProvider_EventsExecutor_0
time=(1591864850311, 2020-06-11 14:10:50.311)


Thread=[name=exchange-worker-#43%IgniteCluster1%, id=63],
state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#43%IgniteCluster1%
time=(1591864850311, 2020-06-11 14:10:50.311)


Thread=[name=sys-#45%IgniteCluster1%, id=65], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#45%IgniteCluster1% time=(1591864850311,
2020-06-11 14:10:50.311)


Thread=[name=sys-#48%IgniteCluster1%, id=68], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#48%IgniteCluster1% time=(1591864850311,
2020-06-11 14:10:50.311)


Thread=[name=sys-#49%IgniteCluster1%, id=69], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#49%IgniteCluster1% time=(1591864850311,
2020-06-11 14:10:50.311)


Thread=[name=sys-#51%IgniteCluster1%, id=71], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#51%IgniteCluster1% time=(1591864850311,
2020-06-11 14:10:50.311)

Full Logs attached here: 
WebIgniteService_WEB.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2754/WebIgniteService_WEB.log>
  

However, if I change my timeouts like this:




It still occurs but a lot less frequently (I observed it only after I have
added 1 server and 5 clients and the communication started between them).

I did some research and found that this is related to the Critical Workers
Health Check feature which I think is a great addition to ignite but I have
a few questions regarding the same.

Question-1: Does the above log mean that the thread is really blocked? or is
it just busy doing something else?

Question-2: How can I decide the suitable values of these timeouts for my
case? The former values were working for me earlier but now I face this
exception.

Question-3: I can see that this failure type (WORKER_THREAD_BLOCKED) is
actually ignored by default so why do we still see it as an ERROR in logs?

Question-4: As a remedy to this, I have thought of adding another timeout to
my configuration:

I read that failureDetectionTimeout is ignored in case any other timeout is
set. Would that mean now my failure detection timeout would also become
3? or would it mean that failureDetectionTimeout would still be the
configured value and just that it's value will be ignored for
systemWorkerBlockedTimeout (which would now be 3)? 

Question-5: How to decide the value for systemWorkerBlockedTimeout, do we
have some guidelines here?

Question-6: As I can see in
https://issues.apache.org/jira/browse/IGNITE-10154, this
WORKER_THREAD_BLOCKED failure is ignored by default, but on setting some
positive value for systemWorkerBlockedTimeout, it would actually start
working. However, I'm not sure if I want that right now. How else can I
handle this scenario so that I don't get these unnecessary and very frequent
exceptions without enabling this failure? 

Please correct me if I'm wrong anywhere.
Thanks in advance.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Questions on the mechanics of activated on-heap in ignite 2.x.x

2020-06-10 Thread Ilya Kasnacheev

Hello!

1. I'm not sure it will be immediately available on heap.
2. This sounds a more reasonable assumption.
3. I guess so, but the real issue here is that on-heap cache increases Full
GC times. At some point it will become infeasible.
4. Yes, it would counteract the change to off-heap model.

Regards,
-- 
Ilya Kasnacheev


сб, 6 июн. 2020 г. в 14:54, VincentCE :

> In our project we are currently using ignite 2.7.6 with native persistence
> disabled and java 11. At the moment we are not using the on-heap feature,
> i.e. all our data lives in off-heap. However in order to gain performance
> we
> are thinking about activating on-heap. While there are already quite many
> questions/answers on that topic in this forum we are still missing some
> points. I would like to use the following scenario for our questions: Say
> we
> have one ignite-server-instance living on a *kubernetes-pod of 32 GiB
> memory
> request/limit size* with the following "hypothetical" configuration:
>
> - JVM options exactly as described here
> https://apacheignite.readme.io/docs/jvm-and-system-tuning, i.e. in
>   particular *10 GB heap fixed*.
> - Off-heap is limited to 15 GiB by adjusting the default region with
> *initSize = maxSize = 15 GiB*.
>   No more data regions are defined.
>
> Before doing anything with our ignite-server-instance we *initially fill
> its
> off-heap with 10 GiB* of data and this will be the only data that it will
> receive.
>
> What happens when we set
>
> *org.apache.ignite.configuration.CacheConfiguration.setOnheapCacheEnabled(true)
> in each data configuration and for now use no eviction policies* in
> particular during loading these 10 GB of data?
>
> More precisely:
> 1. As it is emphasised several times in this forum the data will still be
> be
> loaded into off-heap. But will it immediately also be loaded into heap,
> i.e.
> during the loading procedure each data point gets replicated simultaneously
> to heap resulting in two copies of the same data one in off-heap and one on
> heap after the procedure is finished?
> 2. ... Or will a given data point only be replicated to heap when ever it
> is
> being used, i.e. during computations?
> 3. Lets furthermore assume that our overall configuration was stable before
> switching to on-heap. In order to guarantee that it will do so afterwards
> would we need to increase the heap size by roughly 10 GB to 20 GB and
> therefore also our pod size to roughly 42 GiB? That would imply that using
> on-heap always goes hand in hand with increasing memory resources.
> 4. Obviously in this example we did not define any eviction-policy to
> control the on-heap cache size. However this is indeed intended here
> because
> we would like each data point to be quickly available living also in heap.
> Is this a useful approach (i.e. replicating the whole off-heap also in
> heap)
> in order to reach the overall goal namely better performance? It feels like
> this approach would counteract the change to the off-heap model from ignite
> 2.x.x onwards in terms of GC impacts and so on. Is this correct?
>
> Please let me know if you need more detailed informations about the
> configurations/settings we use.
>
> Thanks in advance!
>
> Vincent
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Questions on the mechanics of activated on-heap in ignite 2.x.x

2020-06-06 Thread VincentCE

In our project we are currently using ignite 2.7.6 with native persistence
disabled and java 11. At the moment we are not using the on-heap feature,
i.e. all our data lives in off-heap. However in order to gain performance we
are thinking about activating on-heap. While there are already quite many
questions/answers on that topic in this forum we are still missing some
points. I would like to use the following scenario for our questions: Say we
have one ignite-server-instance living on a *kubernetes-pod of 32 GiB memory
request/limit size* with the following "hypothetical" configuration:

- JVM options exactly as described here
https://apacheignite.readme.io/docs/jvm-and-system-tuning, i.e. in 
  particular *10 GB heap fixed*.
- Off-heap is limited to 15 GiB by adjusting the default region with
*initSize = maxSize = 15 GiB*. 
  No more data regions are defined.

Before doing anything with our ignite-server-instance we *initially fill its
off-heap with 10 GiB* of data and this will be the only data that it will
receive. 

What happens when we set
*org.apache.ignite.configuration.CacheConfiguration.setOnheapCacheEnabled(true)
in each data configuration and for now use no eviction policies* in
particular during loading these 10 GB of data? 

More precisely: 
1. As it is emphasised several times in this forum the data will still be be
loaded into off-heap. But will it immediately also be loaded into heap, i.e.
during the loading procedure each data point gets replicated simultaneously
to heap resulting in two copies of the same data one in off-heap and one on
heap after the procedure is finished? 
2. ... Or will a given data point only be replicated to heap when ever it is
being used, i.e. during computations?
3. Lets furthermore assume that our overall configuration was stable before
switching to on-heap. In order to guarantee that it will do so afterwards
would we need to increase the heap size by roughly 10 GB to 20 GB and
therefore also our pod size to roughly 42 GiB? That would imply that using
on-heap always goes hand in hand with increasing memory resources.
4. Obviously in this example we did not define any eviction-policy to
control the on-heap cache size. However this is indeed intended here because
we would like each data point to be quickly available living also in heap.
Is this a useful approach (i.e. replicating the whole off-heap also in heap)
in order to reach the overall goal namely better performance? It feels like
this approach would counteract the change to the off-heap model from ignite
2.x.x onwards in terms of GC impacts and so on. Is this correct?

Please let me know if you need more detailed informations about the
configurations/settings we use.

Thanks in advance!

Vincent



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Help with possible advanced Ignite questions during a meetup

2020-05-27 Thread Denis Magda

Hi Gaurav,

That's right. You can find the reasoning here:
http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Stop-Maintenance-of-Ignite-WebConsole-td47451.html


-
Denis


On Tue, May 26, 2020 at 10:52 PM Gaurav Bajaj 
wrote:

> Dear Denis,
>
> Does that mean only "open source/community" version of webconsole will be
> discontinued?
>
>
> Thanks,
> Gaurav
>
> On Wed, May 27, 2020, 10:32 AM Denis Magda  wrote:
>
>> Hi Zaar,
>>
>> Congratulations and thanks for spreading the word about Ignite in
>> Australia! I truly liked the style and flow of your slides.
>>
>> As you reasonably highlighted, Ignite documentation and tooling need more
>> work and attention from the community. Luckily, we are determined to
>> rewrite the docs by the end of this year and doing substantial progress on
>> the tooling/monitoring direction (see IEP-35
>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392> 
>> and
>> IEP-48
>> <https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing>).
>> Ignite Web Console has been discontinued recently
>> <https://issues.apache.org/jira/browse/IGNITE-13038>but instead you can
>> use a myriad of other tools.
>>
>> I hope that's not the last Ignite talk by you. Let's stay in touch.
>>
>> -
>> Denis
>>
>>
>> On Tue, May 26, 2020 at 7:36 PM Zaar Hai  wrote:
>>
>>> Hi all,
>>>
>>> If anyone is interested, here is our approach to Ignite on Google
>>> Kubernetes Engine (GKE): https://github.com/doitintl/ignite-gke
>>>
>>> The slides from my talk are here:
>>> https://www.slideshare.net/ZaarHai/apache-ignite-a-doitall-keyvalue-db
>>> And the recording is here: https://www.youtube.com/watch?v=kYSF5-AlV_0
>>>
>>> Special thanks to Denis for jumping in and providing valuable comments!
>>>
>>> Best,
>>> Zaar
>>>
>>>
>>> On Sat, 16 May 2020 at 04:19, Kseniya Romanova <
>>> romanova.ks@gmail.com> wrote:
>>>
>>>> Hi Zaar! Sounds cool!
>>>> Please meet Denis (in CC). He is ready to help you with questions.
>>>>
>>>>
>>>> пт, 15 мая 2020 г. в 10:17, Zaar Hai :
>>>>
>>>>> Hi there Ignite gurus,
>>>>>
>>>>> I started to work with Ignite lately, mainly from DevOps perspective
>>>>> and this project looks so interesting that I decided to give a talk to
>>>>> share my findings so far.
>>>>>
>>>>> Since my knowledge on Ignite is still limited I thought to ask: maybe
>>>>> someone who's more experienced can join as well, in case the audience will
>>>>> have advanced questions that I'll not be able to answer.
>>>>>
>>>>> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's
>>>>> an online event:
>>>>> https://www.meetup.com/multi-cloud-australia/events/270625697
>>>>>
>>>>> Hope to see you there, and thanks!
>>>>> Zaar
>>>>>
>>>>
>>>
>>> --
>>> Zaar
>>>
>>

Re: Help with possible advanced Ignite questions during a meetup

2020-05-26 Thread Gaurav Bajaj

Dear Denis,

Does that mean only "open source/community" version of webconsole will be
discontinued?


Thanks,
Gaurav

On Wed, May 27, 2020, 10:32 AM Denis Magda  wrote:

> Hi Zaar,
>
> Congratulations and thanks for spreading the word about Ignite in
> Australia! I truly liked the style and flow of your slides.
>
> As you reasonably highlighted, Ignite documentation and tooling need more
> work and attention from the community. Luckily, we are determined to
> rewrite the docs by the end of this year and doing substantial progress on
> the tooling/monitoring direction (see IEP-35
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392> 
> and
> IEP-48
> <https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing>).
> Ignite Web Console has been discontinued recently
> <https://issues.apache.org/jira/browse/IGNITE-13038>but instead you can
> use a myriad of other tools.
>
> I hope that's not the last Ignite talk by you. Let's stay in touch.
>
> -
> Denis
>
>
> On Tue, May 26, 2020 at 7:36 PM Zaar Hai  wrote:
>
>> Hi all,
>>
>> If anyone is interested, here is our approach to Ignite on Google
>> Kubernetes Engine (GKE): https://github.com/doitintl/ignite-gke
>>
>> The slides from my talk are here:
>> https://www.slideshare.net/ZaarHai/apache-ignite-a-doitall-keyvalue-db
>> And the recording is here: https://www.youtube.com/watch?v=kYSF5-AlV_0
>>
>> Special thanks to Denis for jumping in and providing valuable comments!
>>
>> Best,
>> Zaar
>>
>>
>> On Sat, 16 May 2020 at 04:19, Kseniya Romanova 
>> wrote:
>>
>>> Hi Zaar! Sounds cool!
>>> Please meet Denis (in CC). He is ready to help you with questions.
>>>
>>>
>>> пт, 15 мая 2020 г. в 10:17, Zaar Hai :
>>>
>>>> Hi there Ignite gurus,
>>>>
>>>> I started to work with Ignite lately, mainly from DevOps perspective
>>>> and this project looks so interesting that I decided to give a talk to
>>>> share my findings so far.
>>>>
>>>> Since my knowledge on Ignite is still limited I thought to ask: maybe
>>>> someone who's more experienced can join as well, in case the audience will
>>>> have advanced questions that I'll not be able to answer.
>>>>
>>>> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
>>>> online event:
>>>> https://www.meetup.com/multi-cloud-australia/events/270625697
>>>>
>>>> Hope to see you there, and thanks!
>>>> Zaar
>>>>
>>>
>>
>> --
>> Zaar
>>
>

Re: Help with possible advanced Ignite questions during a meetup

2020-05-26 Thread Denis Magda

Hi Zaar,

Congratulations and thanks for spreading the word about Ignite in
Australia! I truly liked the style and flow of your slides.

As you reasonably highlighted, Ignite documentation and tooling need more
work and attention from the community. Luckily, we are determined to
rewrite the docs by the end of this year and doing substantial progress on
the tooling/monitoring direction (see IEP-35
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392>
and
IEP-48
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing>).
Ignite Web Console has been discontinued recently
<https://issues.apache.org/jira/browse/IGNITE-13038>but instead you can use
a myriad of other tools.

I hope that's not the last Ignite talk by you. Let's stay in touch.

-
Denis

On Tue, May 26, 2020 at 7:36 PM Zaar Hai  wrote:

> Hi all,
>
> If anyone is interested, here is our approach to Ignite on Google
> Kubernetes Engine (GKE): https://github.com/doitintl/ignite-gke
>
> The slides from my talk are here:
> https://www.slideshare.net/ZaarHai/apache-ignite-a-doitall-keyvalue-db
> And the recording is here: https://www.youtube.com/watch?v=kYSF5-AlV_0
>
> Special thanks to Denis for jumping in and providing valuable comments!
>
> Best,
> Zaar
>
>
> On Sat, 16 May 2020 at 04:19, Kseniya Romanova 
> wrote:
>
>> Hi Zaar! Sounds cool!
>> Please meet Denis (in CC). He is ready to help you with questions.
>>
>>
>> пт, 15 мая 2020 г. в 10:17, Zaar Hai :
>>
>>> Hi there Ignite gurus,
>>>
>>> I started to work with Ignite lately, mainly from DevOps perspective and
>>> this project looks so interesting that I decided to give a talk to share my
>>> findings so far.
>>>
>>> Since my knowledge on Ignite is still limited I thought to ask: maybe
>>> someone who's more experienced can join as well, in case the audience will
>>> have advanced questions that I'll not be able to answer.
>>>
>>> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
>>> online event:
>>> https://www.meetup.com/multi-cloud-australia/events/270625697
>>>
>>> Hope to see you there, and thanks!
>>> Zaar
>>>
>>
>
> --
> Zaar
>

Re: Help with possible advanced Ignite questions during a meetup

2020-05-26 Thread Zaar Hai

Hi all,

If anyone is interested, here is our approach to Ignite on Google
Kubernetes Engine (GKE): https://github.com/doitintl/ignite-gke

The slides from my talk are here:
https://www.slideshare.net/ZaarHai/apache-ignite-a-doitall-keyvalue-db
And the recording is here: https://www.youtube.com/watch?v=kYSF5-AlV_0

Special thanks to Denis for jumping in and providing valuable comments!

Best,
Zaar


On Sat, 16 May 2020 at 04:19, Kseniya Romanova 
wrote:

> Hi Zaar! Sounds cool!
> Please meet Denis (in CC). He is ready to help you with questions.
>
>
> пт, 15 мая 2020 г. в 10:17, Zaar Hai :
>
>> Hi there Ignite gurus,
>>
>> I started to work with Ignite lately, mainly from DevOps perspective and
>> this project looks so interesting that I decided to give a talk to share my
>> findings so far.
>>
>> Since my knowledge on Ignite is still limited I thought to ask: maybe
>> someone who's more experienced can join as well, in case the audience will
>> have advanced questions that I'll not be able to answer.
>>
>> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
>> online event:
>> https://www.meetup.com/multi-cloud-australia/events/270625697
>>
>> Hope to see you there, and thanks!
>> Zaar
>>
>

-- 
Zaar

Re: Help with possible advanced Ignite questions during a meetup

2020-05-16 Thread Zaar Hai

Thanks Kseniya!

Denis, nice to e-meet you. Will you be able to join the meetup? Again, it's
Apr 26th at 18:00+1000. If you are in Europe, it should work.

Cheers,
Zaar

On Sat, 16 May 2020 at 04:19, Kseniya Romanova 
wrote:

> Hi Zaar! Sounds cool!
> Please meet Denis (in CC). He is ready to help you with questions.
>
>
> пт, 15 мая 2020 г. в 10:17, Zaar Hai :
>
>> Hi there Ignite gurus,
>>
>> I started to work with Ignite lately, mainly from DevOps perspective and
>> this project looks so interesting that I decided to give a talk to share my
>> findings so far.
>>
>> Since my knowledge on Ignite is still limited I thought to ask: maybe
>> someone who's more experienced can join as well, in case the audience will
>> have advanced questions that I'll not be able to answer.
>>
>> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
>> online event:
>> https://www.meetup.com/multi-cloud-australia/events/270625697
>>
>> Hope to see you there, and thanks!
>> Zaar
>>
>

-- 
Zaar

Re: Help with possible advanced Ignite questions during a meetup

2020-05-15 Thread Kseniya Romanova

Hi Zaar! Sounds cool!
Please meet Denis (in CC). He is ready to help you with questions.


пт, 15 мая 2020 г. в 10:17, Zaar Hai :

> Hi there Ignite gurus,
>
> I started to work with Ignite lately, mainly from DevOps perspective and
> this project looks so interesting that I decided to give a talk to share my
> findings so far.
>
> Since my knowledge on Ignite is still limited I thought to ask: maybe
> someone who's more experienced can join as well, in case the audience will
> have advanced questions that I'll not be able to answer.
>
> It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
> online event:
> https://www.meetup.com/multi-cloud-australia/events/270625697
>
> Hope to see you there, and thanks!
> Zaar
>

Help with possible advanced Ignite questions during a meetup

2020-05-15 Thread Zaar Hai

Hi there Ignite gurus,

I started to work with Ignite lately, mainly from DevOps perspective and
this project looks so interesting that I decided to give a talk to share my
findings so far.

Since my knowledge on Ignite is still limited I thought to ask: maybe
someone who's more experienced can join as well, in case the audience will
have advanced questions that I'll not be able to answer.

It's on May 26th, 18:00 UTC+1000 (I'm in Australia), but again, it's an
online event: https://www.meetup.com/multi-cloud-australia/events/270625697

Hope to see you there, and thanks!
Zaar

Re: Schema Questions

2020-05-12 Thread Evgenii Zhuravlev

There is no way to define nested collection of addresses as SQL fields. The
problem is that there is no such types in JDBC, so, it just won't work. So,
if you want to use SQL, just have separate tables for these objects.




вт, 12 мая 2020 г. в 06:07, narges saleh :

> Thanks Evgenii.
> My next two questions are, assuming I go with option 1.1:
> 1) How do I define these nested addresses via query entities, assuming,
> I'd use binaryobjects when inserting. There can be multiple primary
> addresses and secondary addresses. E.g., {john,{primary-address:[addr1,
> addr2], secondary-addess:[addr3, addr4, addr5]}}
> 2) Can I use SQL if I am filtering by person and then I want certain
> information in the addresses?  say I want all the primary addresses for
> john., or I want the cities for the primary addresses for John.
>
> thanks.
>
> On Mon, May 11, 2020 at 4:56 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> The main question here is how you want to use this data. Do you use SQL?
>>
>> 1) It depends on the use case. If you plan to access only a person object
>> without any filtering by addresses and you will always need the entire
>> object, it makes sense to have one big object. But in this case, you won't
>> be able to filter persons by addresses, since SQL doesn't work with
>> collections. So, if you want to use SQL, it definitely makes sense to use
>> the second approach.
>>
>> 2) Of course, if you already have unique ID for object, it makes sense to
>> use it as a key, there is no need to generate an additional field for this.
>>
>> Evgenii
>>
>> пн, 11 мая 2020 г. в 09:20, narges saleh :
>>
>>> Hi All,
>>>
>>> I would appreciate your feedback, for the following, in terms of
>>> performance for both inserts and queries.
>>>
>>> 1) Which one of these patterns is preferable for the table design?
>>> A- Have a fat table/cache with nested objects, e.g. person table with a
>>> hashmap of addresses.
>>> B- Have person and address tables separate and just link them via
>>> foreign keys.
>>>
>>> 2) Which one of these patterns is preferable for primary keys?
>>> A- Have a UUID + affinity key as the primary key
>>> B- Have the keys spelled out + affinity key. For example, assume person
>>> table, combination of age and name uniquely identifies a person, so the key
>>> will be person-name, person-age, and org-id.
>>> If I have a associative table joining persons and addresses (if address
>>> is a separate object), then in case B, I will have to include three fields
>>> from person and the id from the address table, as opposed to case A, where
>>> I will have UUID + orgid + address id. Would having one less field buy me
>>> much, as opposed to having the overhead of creating UUIDs?
>>>
>>> thanks
>>>
>>>

Re: Schema Questions

2020-05-12 Thread narges saleh

Thanks Evgenii.
My next two questions are, assuming I go with option 1.1:
1) How do I define these nested addresses via query entities, assuming, I'd
use binaryobjects when inserting. There can be multiple primary addresses
and secondary addresses. E.g., {john,{primary-address:[addr1, addr2],
secondary-addess:[addr3, addr4, addr5]}}
2) Can I use SQL if I am filtering by person and then I want certain
information in the addresses?  say I want all the primary addresses for
john., or I want the cities for the primary addresses for John.

thanks.

On Mon, May 11, 2020 at 4:56 PM Evgenii Zhuravlev 
wrote:

> Hi,
>
> The main question here is how you want to use this data. Do you use SQL?
>
> 1) It depends on the use case. If you plan to access only a person object
> without any filtering by addresses and you will always need the entire
> object, it makes sense to have one big object. But in this case, you won't
> be able to filter persons by addresses, since SQL doesn't work with
> collections. So, if you want to use SQL, it definitely makes sense to use
> the second approach.
>
> 2) Of course, if you already have unique ID for object, it makes sense to
> use it as a key, there is no need to generate an additional field for this.
>
> Evgenii
>
> пн, 11 мая 2020 г. в 09:20, narges saleh :
>
>> Hi All,
>>
>> I would appreciate your feedback, for the following, in terms of
>> performance for both inserts and queries.
>>
>> 1) Which one of these patterns is preferable for the table design?
>> A- Have a fat table/cache with nested objects, e.g. person table with a
>> hashmap of addresses.
>> B- Have person and address tables separate and just link them via foreign
>> keys.
>>
>> 2) Which one of these patterns is preferable for primary keys?
>> A- Have a UUID + affinity key as the primary key
>> B- Have the keys spelled out + affinity key. For example, assume person
>> table, combination of age and name uniquely identifies a person, so the key
>> will be person-name, person-age, and org-id.
>> If I have a associative table joining persons and addresses (if address
>> is a separate object), then in case B, I will have to include three fields
>> from person and the id from the address table, as opposed to case A, where
>> I will have UUID + orgid + address id. Would having one less field buy me
>> much, as opposed to having the overhead of creating UUIDs?
>>
>> thanks
>>
>>

Re: Schema Questions

2020-05-11 Thread Evgenii Zhuravlev

Hi,

The main question here is how you want to use this data. Do you use SQL?

1) It depends on the use case. If you plan to access only a person object
without any filtering by addresses and you will always need the entire
object, it makes sense to have one big object. But in this case, you won't
be able to filter persons by addresses, since SQL doesn't work with
collections. So, if you want to use SQL, it definitely makes sense to use
the second approach.

2) Of course, if you already have unique ID for object, it makes sense to
use it as a key, there is no need to generate an additional field for this.

Evgenii

пн, 11 мая 2020 г. в 09:20, narges saleh :

> Hi All,
>
> I would appreciate your feedback, for the following, in terms of
> performance for both inserts and queries.
>
> 1) Which one of these patterns is preferable for the table design?
> A- Have a fat table/cache with nested objects, e.g. person table with a
> hashmap of addresses.
> B- Have person and address tables separate and just link them via foreign
> keys.
>
> 2) Which one of these patterns is preferable for primary keys?
> A- Have a UUID + affinity key as the primary key
> B- Have the keys spelled out + affinity key. For example, assume person
> table, combination of age and name uniquely identifies a person, so the key
> will be person-name, person-age, and org-id.
> If I have a associative table joining persons and addresses (if address is
> a separate object), then in case B, I will have to include three fields
> from person and the id from the address table, as opposed to case A, where
> I will have UUID + orgid + address id. Would having one less field buy me
> much, as opposed to having the overhead of creating UUIDs?
>
> thanks
>
>

Schema Questions

2020-05-11 Thread narges saleh

Hi All,

I would appreciate your feedback, for the following, in terms of
performance for both inserts and queries.

1) Which one of these patterns is preferable for the table design?
A- Have a fat table/cache with nested objects, e.g. person table with a
hashmap of addresses.
B- Have person and address tables separate and just link them via foreign
keys.

2) Which one of these patterns is preferable for primary keys?
A- Have a UUID + affinity key as the primary key
B- Have the keys spelled out + affinity key. For example, assume person
table, combination of age and name uniquely identifies a person, so the key
will be person-name, person-age, and org-id.
If I have a associative table joining persons and addresses (if address is
a separate object), then in case B, I will have to include three fields
from person and the id from the address table, as opposed to case A, where
I will have UUID + orgid + address id. Would having one less field buy me
much, as opposed to having the overhead of creating UUIDs?

thanks

Re: Continuous Query Questions

2020-04-01 Thread narges saleh

So, if I define the CQ as a service, and the node crashes, wouldn't ignite
start a new service with CQ already registered, say if the CQ registration
is in service init?
I can  do initial query at the as well.

On Wed, Apr 1, 2020 at 7:33 PM Evgenii Zhuravlev 
wrote:

> Well, with this use case, if one of the nodes goes down, there is always a
> chance to lost notifications. I don't think that it's possible to recover
> lost notifications with out of the box solution, but if you will be able to
> track the last processed notification and store update time in entries, you
> will be able to find not processed entries. Otherwise, you will need to
> register CQ again and process all the entries using initialQuery.
>
> Evgenii
>
> ср, 1 апр. 2020 г. в 13:16, narges saleh :
>
>> Thanks Evgenii for the recommendation and the heads up.
>>
>> Is there a way to recover the lost notifications or even know if a
>> notification is lost?
>>
>> On Wed, Apr 1, 2020 at 12:15 PM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> 1) I would recommend checking ContinouousQuery.setLocal:
>>> https://www.gridgain.com/sdk/ce/latest/javadoc/org/apache/ignite/cache/query/Query.html#setLocal-boolean-.
>>> Please check if it fits your requirements.
>>> 2) You will need to do this in a separate thread, because cache
>>> operations shouldn't be used inside CQ listeners, as they are executed
>>> synchronously.
>>>
>>> In case of using local CQ, there is a chance to miss notifications in
>>> case of node failure, it's described in javadoc.
>>>
>>> Evgenii
>>>
>>>
>>> вт, 31 мар. 2020 г. в 03:00, narges saleh :
>>>
 Hi All,
 I'd like to get your feedback regarding the following pattern.

 1) CQ setup that listens to the changes to a cache on the local node
 only.
 2) Upon receiving notification on a change, the listener makes
 additions to two other caches, one being on the local node (partitioned)
 and the other cache being replicated across all the nodes in the cluster.

 Is this setup performant and reliable in terms of the data staying in
 sync across the cluster?

 thanks.

Re: Continuous Query Questions

2020-04-01 Thread Evgenii Zhuravlev

Well, with this use case, if one of the nodes goes down, there is always a
chance to lost notifications. I don't think that it's possible to recover
lost notifications with out of the box solution, but if you will be able to
track the last processed notification and store update time in entries, you
will be able to find not processed entries. Otherwise, you will need to
register CQ again and process all the entries using initialQuery.

Evgenii

ср, 1 апр. 2020 г. в 13:16, narges saleh :

> Thanks Evgenii for the recommendation and the heads up.
>
> Is there a way to recover the lost notifications or even know if a
> notification is lost?
>
> On Wed, Apr 1, 2020 at 12:15 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> 1) I would recommend checking ContinouousQuery.setLocal:
>> https://www.gridgain.com/sdk/ce/latest/javadoc/org/apache/ignite/cache/query/Query.html#setLocal-boolean-.
>> Please check if it fits your requirements.
>> 2) You will need to do this in a separate thread, because cache
>> operations shouldn't be used inside CQ listeners, as they are executed
>> synchronously.
>>
>> In case of using local CQ, there is a chance to miss notifications in
>> case of node failure, it's described in javadoc.
>>
>> Evgenii
>>
>>
>> вт, 31 мар. 2020 г. в 03:00, narges saleh :
>>
>>> Hi All,
>>> I'd like to get your feedback regarding the following pattern.
>>>
>>> 1) CQ setup that listens to the changes to a cache on the local node
>>> only.
>>> 2) Upon receiving notification on a change, the listener makes additions
>>> to two other caches, one being on the local node (partitioned) and the
>>> other cache being replicated across all the nodes in the cluster.
>>>
>>> Is this setup performant and reliable in terms of the data staying in
>>> sync across the cluster?
>>>
>>> thanks.
>>>
>>>
>>>

Re: Continuous Query Questions

2020-04-01 Thread narges saleh

Thanks Evgenii for the recommendation and the heads up.

Is there a way to recover the lost notifications or even know if a
notification is lost?

On Wed, Apr 1, 2020 at 12:15 PM Evgenii Zhuravlev 
wrote:

> Hi,
>
> 1) I would recommend checking ContinouousQuery.setLocal:
> https://www.gridgain.com/sdk/ce/latest/javadoc/org/apache/ignite/cache/query/Query.html#setLocal-boolean-.
> Please check if it fits your requirements.
> 2) You will need to do this in a separate thread, because cache operations
> shouldn't be used inside CQ listeners, as they are executed synchronously.
>
> In case of using local CQ, there is a chance to miss notifications in case
> of node failure, it's described in javadoc.
>
> Evgenii
>
>
> вт, 31 мар. 2020 г. в 03:00, narges saleh :
>
>> Hi All,
>> I'd like to get your feedback regarding the following pattern.
>>
>> 1) CQ setup that listens to the changes to a cache on the local node only.
>> 2) Upon receiving notification on a change, the listener makes additions
>> to two other caches, one being on the local node (partitioned) and the
>> other cache being replicated across all the nodes in the cluster.
>>
>> Is this setup performant and reliable in terms of the data staying in
>> sync across the cluster?
>>
>> thanks.
>>
>>
>>

Re: Continuous Query Questions

2020-04-01 Thread Evgenii Zhuravlev

Hi,

1) I would recommend checking ContinouousQuery.setLocal:
https://www.gridgain.com/sdk/ce/latest/javadoc/org/apache/ignite/cache/query/Query.html#setLocal-boolean-.
Please check if it fits your requirements.
2) You will need to do this in a separate thread, because cache operations
shouldn't be used inside CQ listeners, as they are executed synchronously.

In case of using local CQ, there is a chance to miss notifications in case
of node failure, it's described in javadoc.

Evgenii


вт, 31 мар. 2020 г. в 03:00, narges saleh :

> Hi All,
> I'd like to get your feedback regarding the following pattern.
>
> 1) CQ setup that listens to the changes to a cache on the local node only.
> 2) Upon receiving notification on a change, the listener makes additions
> to two other caches, one being on the local node (partitioned) and the
> other cache being replicated across all the nodes in the cluster.
>
> Is this setup performant and reliable in terms of the data staying in sync
> across the cluster?
>
> thanks.
>
>
>

Continuous Query Questions

2020-03-31 Thread narges saleh

Hi All,
I'd like to get your feedback regarding the following pattern.

1) CQ setup that listens to the changes to a cache on the local node only.
2) Upon receiving notification on a change, the listener makes additions to
two other caches, one being on the local node (partitioned) and the other
cache being replicated across all the nodes in the cluster.

Is this setup performant and reliable in terms of the data staying in sync
across the cluster?

thanks.

Re: Continuous Query Questions

2020-03-13 Thread Evgenii Zhuravlev

I'm not sure, because the final overhead depends on the object sizes. There
is a buffer for CQ, which stores 1000 entries by default, but you can
decrease it using property IGNITE_CONTINUOUS_QUERY_SERVER_BUFFER_SIZE

Evgenii

вт, 18 февр. 2020 г. в 18:09, narges saleh :

> Hi Evgeni,
>
> There will be several thousands notifications/day if I have it send
> notification only when certain patterns are visited, in about 100+ caches,
> which brings up another question: wouldn't having 100+ CQs be creating too
> much overhead?
>
> thanks.
>
> On Tue, Feb 18, 2020 at 2:17 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> How many notifications do you want to get? If it's just a several
>> notifications, then you can even register separate CQ for each of the entry
>> with its own remote filters. At the same time, if you have a requirement to
>> send these notifications for a lot of entries, then this approach will
>> create a big overhead.
>>
>> Its possible to unregister a CQ after you get first notification - you
>> just need to return FALSE from a remote filter. Also, you can send not
>> exact entry, but only some fields using Transformer:
>> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/continuous-queries#remote-transformer.
>> You can create some another object, which will contain only part of the
>> fields.
>>
>> Best Regards,
>> Evgeni
>>
>>
>> пн, 17 февр. 2020 г. в 03:58, narges saleh :
>>
>>> Hi All,
>>> I am getting the following streams of the following records:
>>> name, org, year, month, day
>>> 1- john, acc, 2004, 2, 1
>>> 2- pete, rd, 2004, 3,1
>>> 3- jim,hr,2004, 5,2
>>> 4- jerry,math,2005,2,1
>>> 5- betty,park,2005,3,2
>>> 6- carry,acc,2006,1,1
>>>
>>> I want to get notification for the first occurrence of a particular
>>> value. So, I want to get notifications when I get records 1, 4 and 6, and
>>> in this case, I want to get the fields, org, and year back only.
>>>
>>> Questions:
>>> 1) Is CQ overkill in this case? If yes, what's a better alternative?
>>> 2) If not, how can I set up CQ to get only one record per occurrence?
>>> 3) How would I return only org and year back with the CQ transformer,
>>> considering that I am working with a flat object? Note that in reality this
>>> record has 25-30 fields (I am showing only 5 of them).
>>>
>>> thanks.
>>>
>>

Re: Continuous Query Questions

2020-02-18 Thread narges saleh

Hi Evgeni,

There will be several thousands notifications/day if I have it send
notification only when certain patterns are visited, in about 100+ caches,
which brings up another question: wouldn't having 100+ CQs be creating too
much overhead?

thanks.

On Tue, Feb 18, 2020 at 2:17 PM Evgenii Zhuravlev 
wrote:

> Hi,
>
> How many notifications do you want to get? If it's just a several
> notifications, then you can even register separate CQ for each of the entry
> with its own remote filters. At the same time, if you have a requirement to
> send these notifications for a lot of entries, then this approach will
> create a big overhead.
>
> Its possible to unregister a CQ after you get first notification - you
> just need to return FALSE from a remote filter. Also, you can send not
> exact entry, but only some fields using Transformer:
> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/continuous-queries#remote-transformer.
> You can create some another object, which will contain only part of the
> fields.
>
> Best Regards,
> Evgeni
>
>
> пн, 17 февр. 2020 г. в 03:58, narges saleh :
>
>> Hi All,
>> I am getting the following streams of the following records:
>> name, org, year, month, day
>> 1- john, acc, 2004, 2, 1
>> 2- pete, rd, 2004, 3,1
>> 3- jim,hr,2004, 5,2
>> 4- jerry,math,2005,2,1
>> 5- betty,park,2005,3,2
>> 6- carry,acc,2006,1,1
>>
>> I want to get notification for the first occurrence of a particular
>> value. So, I want to get notifications when I get records 1, 4 and 6, and
>> in this case, I want to get the fields, org, and year back only.
>>
>> Questions:
>> 1) Is CQ overkill in this case? If yes, what's a better alternative?
>> 2) If not, how can I set up CQ to get only one record per occurrence?
>> 3) How would I return only org and year back with the CQ transformer,
>> considering that I am working with a flat object? Note that in reality this
>> record has 25-30 fields (I am showing only 5 of them).
>>
>> thanks.
>>
>

Re: Continuous Query Questions

2020-02-18 Thread Evgenii Zhuravlev

Hi,

How many notifications do you want to get? If it's just a several
notifications, then you can even register separate CQ for each of the entry
with its own remote filters. At the same time, if you have a requirement to
send these notifications for a lot of entries, then this approach will
create a big overhead.

Its possible to unregister a CQ after you get first notification - you just
need to return FALSE from a remote filter. Also, you can send not exact
entry, but only some fields using Transformer:
https://www.gridgain.com/docs/latest/developers-guide/key-value-api/continuous-queries#remote-transformer.
You can create some another object, which will contain only part of the
fields.

Best Regards,
Evgeni


пн, 17 февр. 2020 г. в 03:58, narges saleh :

> Hi All,
> I am getting the following streams of the following records:
> name, org, year, month, day
> 1- john, acc, 2004, 2, 1
> 2- pete, rd, 2004, 3,1
> 3- jim,hr,2004, 5,2
> 4- jerry,math,2005,2,1
> 5- betty,park,2005,3,2
> 6- carry,acc,2006,1,1
>
> I want to get notification for the first occurrence of a particular value.
> So, I want to get notifications when I get records 1, 4 and 6, and in this
> case, I want to get the fields, org, and year back only.
>
> Questions:
> 1) Is CQ overkill in this case? If yes, what's a better alternative?
> 2) If not, how can I set up CQ to get only one record per occurrence?
> 3) How would I return only org and year back with the CQ transformer,
> considering that I am working with a flat object? Note that in reality this
> record has 25-30 fields (I am showing only 5 of them).
>
> thanks.
>

Re: baseline topology questions

2020-02-11 Thread Stephen Darlington

Persistence doesn’t change anything about the distribution of data. It also 
doesn’t change anything about “rebalancing” the data. The only real difference 
is that you trigger rebalancing by changing the baseline topology manually, a 
process that is generally automatic when you use Ignite in-memory-only. Using 
Kubernetes doesn’t change anything about how Ignite works.

With that said:

1) It doesn’t change anything. All nodes have a copy of the data (including 
WAL) as before.
2) 
a) What do you mean by “the pod dies”? If you mean it crashed and restarted, 
yes, the data will just come from disk as long as it connects to the same PV. 
If you mean you lost the pod and the PV then data would have to be copied from 
another node (after you manually altered the baseline topology)
b) As above
c) No. Rebalancing happens when you change the baseline topology.
d) If the new pod picks up the “old” PV, yes.

> On 11 Feb 2020, at 03:56, narges saleh  wrote:
> 
> Hi All,
> Sorry if these questions are too basic.
> 
> 1) How does cache replication work in context of native persistence, 
> especially in context of WAL files? Do the primary and replication node have 
> separate WAL files?
> 2) How does baseline topology work in context of kubernetes, with persistent 
> storage volume enabled? When a pod dies,  and is replaced with a new pod, 
>   a) does the cache gets rebuilt from the persisted data to disk on the 
> hosting node or 
>   b) does it reloaded from its replica cache (assuming cache was configured 
> with replicated option)? or
>   c) does rebalancing occur with creation of a replacement pod? If yes, what 
> happens to the data on disk?
>   d) does the replacement pod automatically assume the id of the lost pod (as 
> far basline topology is concerned)?
> 
> thanks.

baseline topology questions

2020-02-10 Thread narges saleh

Hi All,
Sorry if these questions are too basic.

1) How does cache replication work in context of native persistence,
especially in context of WAL files? Do the primary and replication node
have separate WAL files?
2) How does baseline topology work in context of kubernetes, with
persistent storage volume enabled? When a pod dies,  and is replaced with a
new pod,
  a) does the cache gets rebuilt from the persisted data to disk on the
hosting node or
  b) does it reloaded from its replica cache (assuming cache was configured
with replicated option)? or
  c) does rebalancing occur with creation of a replacement pod? If yes,
what happens to the data on disk?
  d) does the replacement pod automatically assume the id of the lost pod
(as far basline topology is concerned)?

thanks.

Re: Questions regarding Data Availability/Replication. @Ignite Team, please answer as it's important for us.

2020-01-30 Thread Evgenii Zhuravlev

Hi,

The easiest solution that can be used here is just to have 2 server nodes
in one cluster, make sure that all primary partitions will be on the SN1
only(it can be done using affinity function). Caches should be configured
as Replicated and SN2 will have backups for everything. However, if you
have a bad(or even fragile) connection between nodes, it can affect a
performance in case of losing connection. I think you can try it with
PRIMARY_SYNC caches. As for question 2, yes, after SN1 will be down, SN2
will be used for all queries and after SN1  will be returned, it will
become a primary node again.

Alternatively, you can manage it on the application side by having 2
separate clusters, or use something else, like Kafka streamer or GridGain
DR. More information about all approaches can be found in this webinar:
https://www.youtube.com/watch?v=lbLyy_vZfsA


Best Regards,
Evgenii

ср, 29 янв. 2020 г. в 06:24, rssah <77adity...@gmail.com>:

> Scenario:
>
> Datacenter 1 (Main) -> One Application Server (Client Node CN1) and One DB
> Server (Server Node SN1).
>
> Datacenter 2 (Recovery) -> One Application Server (Client Node CN2) and One
> DB Server (Server Node SN2).
>
> Usually all the user requests go to CN1 but not CN2. So, technically we
> will
> not run CN2. So forget about CN2. Will tell why it's used later.
>
> So there are three nodes here running -> CN1, SN1, SN2.
>
> Question 1:
>
> I don't use Apache Ignite for cache purpose, but we will use it for disk
> persistent storage. And we don't need Ignite for partitioning data (we will
> think about it later). Now, how to make sure that the data that is being
> stored in SN1 will be replicated to SN2. We want SN2 to be backup node for
> SN1. Is that possible? If so, how?
>
> Conditions:
>
> 1. CN1 will not make connections to SN2.
> 2. SN1 and SN2 link will be very fragile and of less bandwidth. Usually,
> both will be in different network segments.
>
> Question 2:
>
> Let's say SN1 is down. Now will the queries automatically reach SN2 without
> doing anything? And if SN1 is restarted, can we expect the queries to reach
> SN1 instead of SN2, as CN1 - SN2 link is slow?
>
> Question 3 :
>
> After a few months, we will flip the entire setup, making DC2 as Main and
> DC1 as Recovery. In this case, CN2 will make connections to SN2 and CN1
> will
> not be running. How to quickly switch so that SN2 is replicated almost with
> SN1 and be ready to serve requests from CN2?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Questions regarding Data Availability/Replication. @Ignite Team, please answer as it's important for us.

2020-01-29 Thread rssah

Scenario:

Datacenter 1 (Main) -> One Application Server (Client Node CN1) and One DB
Server (Server Node SN1).

Datacenter 2 (Recovery) -> One Application Server (Client Node CN2) and One
DB Server (Server Node SN2).

Usually all the user requests go to CN1 but not CN2. So, technically we will
not run CN2. So forget about CN2. Will tell why it's used later.

So there are three nodes here running -> CN1, SN1, SN2.

Question 1:

I don't use Apache Ignite for cache purpose, but we will use it for disk
persistent storage. And we don't need Ignite for partitioning data (we will
think about it later). Now, how to make sure that the data that is being
stored in SN1 will be replicated to SN2. We want SN2 to be backup node for
SN1. Is that possible? If so, how?

Conditions:

1. CN1 will not make connections to SN2.
2. SN1 and SN2 link will be very fragile and of less bandwidth. Usually,
both will be in different network segments.

Question 2:

Let's say SN1 is down. Now will the queries automatically reach SN2 without
doing anything? And if SN1 is restarted, can we expect the queries to reach
SN1 instead of SN2, as CN1 - SN2 link is slow?

Question 3 :

After a few months, we will flip the entire setup, making DC2 as Main and
DC1 as Recovery. In this case, CN2 will make connections to SN2 and CN1 will
not be running. How to quickly switch so that SN2 is replicated almost with
SN1 and be ready to serve requests from CN2?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Embedded ignite and baseline upgrade questions

2019-12-31 Thread djm132

Thanks, its definitely clear now that rebalancing should be triggered from
code if node removal detected. Assuming that number of backups > 0 and only
one node removed it looks like safe case. But what if backup count = 0 (bad
idea but the risk may be acceptable in some cases) and we need to shutdown
node to remove it from BL ? Is there any way to trigger exclusive
rebalancing (ie moving out all data from some node to other valid nodes) and
than exclude this node from data storage ? 

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Embedded ignite and baseline upgrade questions

2019-12-30 Thread akorensh

Hi,
  Your summary looks correct.
  It should be possible to manage your baseline topology using a config
file, provided you follow the steps outlined.

more info here:
https://apacheignite.readme.io/docs/baseline-topology#section-triggering-rebalancing-programmatically

  You can use
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/CacheRebalancingEvent.html
for each cache 


You can also use JMX to monitor rebalance:
https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/metrics#monitoring-rebalancing

In general, it is advisable to clean old metadata when re-adding a baseline
node(especially if you made changes to you cache config), but be careful to
make sure that no data is lost inadvertently.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Embedded ignite and baseline upgrade questions

2019-12-30 Thread djm132

!= in kotlin uses equals() under the hood so it works here as expected.

I can't use control.sh to manage topology (becase ignite is embedded) and
trying to implement it in code. So the actual initialization sequence is:

1) Start all fresh nodes, wait for all specified persistence nodes to be
online.
2) Auto-activation sets baseline to all nodes online to the moment.
3) If it first launch - currentBaselineTopology() will be null (or probably
different set) and I will set it to defined list of nodes using
setBaselineTopology().
4) Later if extending baseline required - we wait for all of them and call
setBaselineTopology to new list.
5) Later if shrinking baseline required - we set it using the same
setBaselineTopology() again (excluded node need to be offline for sure), it
also triggers data rebalancing as expected.
6) If we need to include new node to baseline, its definitely need to be
cleaned from old stored baseline metadata, right?

So it seems that managing topology by specifying a list in config file is
possible with this code? Or may be I am missing something ?

Is there any global event for data rebalancing ? Or it should be tracked for
every cache with localListen ?

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Machine Learning questions

2019-12-30 Thread zaleslaw

SPOILER: I need to say that the release 2.8 will be published after New Year
and all answers will be related to the new release.

If we talk to 2.8 release (the last update of ML functionality in master and
release branch)

 +++ I assume that I would start by extracting features from my JSON records
in
a cache into a vectorizer - how does this impact memory usage? +++

The answer is here:
https://apacheignite.readme.io/docs/ml-partition-based-dataset

The cache will be in memory and additional data will be located in heap
too(but not in caches but near)
Of course, more memory is required (depends on training algorithm)

If heap is small you have a chance to get and OOM

+++Are there any built-in algorithms or recommended strategies for
sampling+++
Please have a look here 
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_7_Split_train_test.java

You could use the same mechanism to get the random sample

But the have no sampling tool as is to get the sample rows from cache. It is
not a part of ML functionality now.

+++ Are there any dataset statistical functions like those provided by
Python's ML libraries, for high-level evaluation of specific features in a
dataset (to assess things like missing-data, cardinality, min-max, mean,
mode, standard-deviation, percentiles, etc)? +++

We are not manipulate directly the data in caches, the build new data in new
format for training purposes, but we doesn't support in ML pandas-like
operations.

We have preprocessing algorithms, but they could be used as a first step in
training Pipeline
https://apacheignite.readme.io/docs/preprocessing

Hope that in 2.9 summary for the dataset and a few stats (like described
above) will be added.

+++ - Is there any doc/video tutorial that would provide a guide for the
complete workflow pipeline for an ML example (encompassing the
abovementioned operations)? +++

First of all, please have a look to the Titanic Tutorial
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial
and another examples
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml

Also a few videos are available in my channel
https://www.youtube.com/watch?v=3CmnV6IQtTw
https://www.youtube.com/watch?v=DmoMBsiHxf8

Jose, great questions, hope to share more docs and papers about Ignite ML
after New Year and 2.8 release.








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Embedded ignite and baseline upgrade questions

2019-12-30 Thread Ilya Kasnacheev

Hello!

First of all, I really hope you're not using != to compare collections :)

Second, cluster will auto-activate when all baseline nodes have joined. For
the first time you have to manually activate the cluster.

The algorithm looks OK and nodes should auto-activate once they are all up.

With regards to your first question - you have managed nodes from different
baseline topologies, i.e., nodes from different clusters. Persistent node
from the "wrong" cluster will not join.

Regards,
-- 
Ilya Kasnacheev


сб, 28 дек. 2019 г. в 00:41, djm132 :

> Also when starting second node I am getting this:
>
> 2019/12/27 23:39:36.665 [disco-pool-#55] WARN BaselineTopology of joining
> node (dev-1) is not compatible with BaselineTopology in the cluster.
> Branching history of cluster BlT ([95475834]) doesn't contain branching
> point hash of joining node BlT (95475833). Consider cleaning persistent
> storage of the node and adding it to the cluster again.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Machine Learning questions

2019-12-30 Thread joseheitor

Hi Guys,

A few questions as I progress through my ML learning journey with Ignite...

- I assume that I would start by extracting features from my JSON records in
a cache into a vectorizer - how does this impact memory usage? Will origin
cache records be moved to disk, as more memory is required than is available
for the data in the vectorizer? Or will the vectorizer data begin to use
swap? Or will I get OOM exceptions?

- Are there any built-in algorithms or recommended strategies for sampling?

- Are there any dataset statistical functions like those provided by
Python's ML libraries, for high-level evaluation of specific features in a
dataset (to assess things like missing-data, cardinality, min-max, mean,
mode, standard-deviation, percentiles, etc)?

- Is there any doc/video tutorial that would provide a guide for the
complete workflow pipeline for an ML example (encompassing the
abovementioned operations)?

Thanks,
Jose



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Embedded ignite and baseline upgrade questions

2019-12-27 Thread djm132

Also when starting second node I am getting this:

2019/12/27 23:39:36.665 [disco-pool-#55] WARN BaselineTopology of joining
node (dev-1) is not compatible with BaselineTopology in the cluster.
Branching history of cluster BlT ([95475834]) doesn't contain branching
point hash of joining node BlT (95475833). Consider cleaning persistent
storage of the node and adding it to the cluster again.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Embedded ignite and baseline upgrade questions

2019-12-27 Thread dobermann132

Hi,

 

I am implementing distributed crawler engine using ignite in embedded mode.
Each crawler node have persistence enabled and assigned predefined
consistentId (dev-1, dev-2, dev-3, etc). Baseline topology (nodes which
store data) can change and configured using config file like this:

 

# Cluster settings

cluster {

# Baseline topology (list of node ids which store data)

baseline = [

"dev-1",

"dev-2",

"dev-3"

]

}

 

Each crawler node startup implemented like this:

 

// Start ignite

ignite = Ignition.start(igniteConfig)

 

// Activate

ignite.cluster().active(true)

 

// Validate baseline topology

val baseline =
config.getStringAll(SpiderNodeOptions.CLUSTER_BASELINE).toSet()

val curTopology = ignite.cluster().currentBaselineTopology()

if (curTopology == null || baseline != curTopology.map {
it.consistentId().toString() }.toSet()) {

// Wait for all new topology members to join

log.info("Cluster baseline topology changed - waiting data nodes
to join...")

while (true) {

val nodes = mutableListOf()

ignite.cluster().nodes().forEach {

if (baseline.contains(it.consistentId().toString()))

nodes.add(it)

}

 

if (nodes.size == baseline.size) {

log.info("Updated baseline topology of {} nodes",
value("baseline_size", nodes.size))

ignite.cluster().setBaselineTopology(nodes)

break

}

else

Thread.sleep(1000)

}

}

 

We can't set baseline topology before cluster activation so activate()
called first. Is it possible to set baseline topology without activating
cluster for first time ? Is it safe to activate node with the code above ?
It seems that we have misconfigured state in clean cluster between
activate() and waiting nodes to join when setBaseliteTopology not called
yet. The steps for upgrading topology will be 1) modify and commit config
files 2) shutdown all cluster members 3) update code 4) start each node one
by one. Is this the right way of reconfiguration ?

 

Thanks,

Michael

RE: custom java classes and ignite node questions

2019-12-17 Thread Scott Cote

Thanks Ilya – we are standing up a second node in our QA to test the 
segregation by node (not region).

Scott Cote
Senior Application Developer  - Java | Electronic Transaction Consultants 
Corporation (ETC)
1600 N. Collins Boulevard, Suite 4000, Richardson, TX 75080
(o) 469.248.4576 | (c) 972.900.1561
[cid:image001.png@01D5B4D5.CDE97300]<https://etcc.com/>   
[cid:image002.png@01D5B4D5.CDE97300] 
<https://www.linkedin.com/company/etctoll/>
[cid:image003.png@01D5B4D5.CDE97300] 
<https://www.youtube.com/channel/UChijFyFc4waNkpsJP52K0xw>
[cid:image004.png@01D5B4D5.CDE97300] 
<https://www.facebook.com/ElectronicTransactionConsultants/>

CONFIDENTIALITY NOTICE: The information accompanying this email transmission 
may contain confidential information that is intended only for the use of the 
individual or authorized representatives of intended recipient. If you are not 
the intended recipient or authorized representative, you are hereby notified 
that any disclosure, copying, distribution or reliance upon the contents of 
this email is strictly prohibited. If you receive this email in error, please 
notify the sender immediately by return email and delete message and any 
attachments from your system.

From: Ilya Kasnacheev 
Sent: Tuesday, December 17, 2019 6:47 AM
To: user@ignite.apache.org
Subject: Re: custom java classes and ignite node questions

Hello!

Did you try it to check if it works? For one thing, we do not recommend going 
to production with solutions which are theoretically sound, but were not tried 
in staging.

Regards,
--
Ilya Kasnacheev


пн, 16 дек. 2019 г. в 20:29, Scott Cote mailto:sc...@etcc.com>>:
Igniters,

Request:  we want to segregate our caches so that they are no longer sharing a 
common jar in the libs folder.

Background:
We currently deploy our custom java classes in a jar to the lib folder of the 
config directory of Apache Ignite (three nodes operating as data servers, and 
several client nodes).  At this time, all classes are deployed to all of the 
nodes, and all caches are going into the same “grid” (setIgniteInstanceName of 
Ignite) and into the same data region.  For purposes of discussion, call our 
data region REGION_X and call our grid G_X.  This works, but it’s a pain to 
manage.

In short, caches C1, C2, … CN all are located within REGION_X and grid G_X .  
They all share Jar J1.

Given that we could – with enough time, modify our code to use binary 
marshaller, this question would become a moot point, we cannot at THIS time 
perform this task.


GOAL: We would like Caches C1, C2 to share jar J1, C3, C4, … CN to share jar J2.

QUESTION: To achieve our goal, do we need to create grid G_Y and deploy Jar J2 
to its lib folder (and reference C3,C4,.. , CN from that grid), or is it a 
matter of creating REGION_Y, and assigning C3, C4, … CN to that region?  Or is 
there yet another way?

My assumption is that we would need to create a new grid node in order for the 
classes to be segregated – has nothing to do with the region (we have heard 
conflicting information – and I cannot find support for the region attempt).

Thanks in advance.

SCott





Scott Cote
Senior Application Developer  - Java | Electronic Transaction Consultants 
Corporation (ETC)
1600 N. Collins Boulevard, Suite 4000, Richardson, TX 75080
(o) 469.248.4576 | (c) 972.900.1561
[cid:image005.png@01D5B4D5.CDE97300]<https://etcc.com/>   [A picture containing 
vector graphicsDescription automatically generated] 
<https://www.linkedin.com/company/etctoll/>
[cid:image007.png@01D5B4D5.CDE97300] 
<https://www.youtube.com/channel/UChijFyFc4waNkpsJP52K0xw>
[cid:image008.png@01D5B4D5.CDE97300] 
<https://www.facebook.com/ElectronicTransactionConsultants/>

[cid:image009.png@01D5B4D5.CDE97300]<https://etcc.com/>

CONFIDENTIALITY NOTICE: The information accompanying this email transmission 
may contain confidential information that is intended only for the use of the 
individual or authorized representatives of intended recipient. If you are not 
the intended recipient or authorized representative, you are hereby notified 
that any disclosure, copying, distribution or reliance upon the contents of 
this email is strictly prohibited. If you receive this email in error, please 
notify the sender immediately by return email and delete message and any 
attachments from your system.

Re: custom java classes and ignite node questions

2019-12-17 Thread Ilya Kasnacheev

Hello!

Did you try it to check if it works? For one thing, we do not recommend
going to production with solutions which are theoretically sound, but were
not tried in staging.

Regards,
-- 
Ilya Kasnacheev


пн, 16 дек. 2019 г. в 20:29, Scott Cote :

> Igniters,
>
>
>
> Request:  we want to segregate our caches so that they are no longer
> sharing a common jar in the libs folder.
>
>
>
> Background:
>
> We currently deploy our custom java classes in a jar to the lib folder of
> the config directory of Apache Ignite (three nodes operating as data
> servers, and several client nodes).  At this time, all classes are deployed
> to all of the nodes, and all caches are going into the same “grid”
> (setIgniteInstanceName of Ignite) and into the same data region.  For
> purposes of discussion, call our data region REGION_X and call our grid
> G_X.  This works, but it’s a pain to manage.
>
>
>
> In short, caches C1, C2, … CN all are located within REGION_X and grid G_X
> .  They all share Jar J1.
>
>
>
> Given that we could – with enough time, modify our code to use binary
> marshaller, this question would become a moot point, we cannot at THIS time
> perform this task.
>
>
>
>
>
> GOAL: We would like Caches C1, C2 to share jar J1, C3, C4, … CN to share
> jar J2.
>
>
>
> QUESTION: To achieve our goal, do we need to create grid G_Y and deploy
> Jar J2 to its lib folder (and reference C3,C4,.. , CN from that grid), or
> is it a matter of creating REGION_Y, and assigning C3, C4, … CN to that
> region?  Or is there yet another way?
>
>
>
> My assumption is that we would need to create a new grid node in order for
> the classes to be segregated – has nothing to do with the region (we have
> heard conflicting information – and I cannot find support for the region
> attempt).
>
>
>
> Thanks in advance.
>
>
>
> SCott
>
>
>
>
>
>
>
>
>
>
>
> Scott Cote
>
> Senior Application Developer  - Java | Electronic Transaction Consultants
> Corporation (ETC)
>
> 1600 N. Collins Boulevard, Suite 4000, Richardson, TX 75080
>
> (o) 469.248.4576 | (c) 972.900.1561
>
>    [image: A picture containing vector graphics
> Description automatically generated]
> 
> 
> 
>
>
>
> 
>
>
>
> *CONFIDENTIALITY NOTICE:* The information accompanying this email
> transmission may contain confidential information that is intended only for
> the use of the individual or authorized representatives of intended
> recipient. If you are not the intended recipient or authorized
> representative, you are hereby notified that any disclosure, copying,
> distribution or reliance upon the contents of this email is strictly
> prohibited. If you receive this email in error, please notify the sender
> immediately by return email and delete message and any attachments from
> your system.
>
>
>

custom java classes and ignite node questions

2019-12-16 Thread Scott Cote

Igniters,

Request:  we want to segregate our caches so that they are no longer sharing a 
common jar in the libs folder.

Background:
We currently deploy our custom java classes in a jar to the lib folder of the 
config directory of Apache Ignite (three nodes operating as data servers, and 
several client nodes).  At this time, all classes are deployed to all of the 
nodes, and all caches are going into the same "grid" (setIgniteInstanceName of 
Ignite) and into the same data region.  For purposes of discussion, call our 
data region REGION_X and call our grid G_X.  This works, but it's a pain to 
manage.

In short, caches C1, C2, ... CN all are located within REGION_X and grid G_X .  
They all share Jar J1.

Given that we could - with enough time, modify our code to use binary 
marshaller, this question would become a moot point, we cannot at THIS time 
perform this task.


GOAL: We would like Caches C1, C2 to share jar J1, C3, C4, ... CN to share jar 
J2.

QUESTION: To achieve our goal, do we need to create grid G_Y and deploy Jar J2 
to its lib folder (and reference C3,C4,.. , CN from that grid), or is it a 
matter of creating REGION_Y, and assigning C3, C4, ... CN to that region?  Or 
is there yet another way?

My assumption is that we would need to create a new grid node in order for the 
classes to be segregated - has nothing to do with the region (we have heard 
conflicting information - and I cannot find support for the region attempt).

Thanks in advance.

SCott





Scott Cote
Senior Application Developer  - Java | Electronic Transaction Consultants 
Corporation (ETC)
1600 N. Collins Boulevard, Suite 4000, Richardson, TX 75080
(o) 469.248.4576 | (c) 972.900.1561
[cid:image002.png@01D5B404.0BC16430]   [A picture containing 
vector graphics  Description automatically generated] 

[cid:image006.png@01D5B404.0BC16430] 

[cid:image008.png@01D5B404.0BC16430] 


[cid:image010.png@01D5B404.0BC16430]

CONFIDENTIALITY NOTICE: The information accompanying this email transmission 
may contain confidential information that is intended only for the use of the 
individual or authorized representatives of intended recipient. If you are not 
the intended recipient or authorized representative, you are hereby notified 
that any disclosure, copying, distribution or reliance upon the contents of 
this email is strictly prohibited. If you receive this email in error, please 
notify the sender immediately by return email and delete message and any 
attachments from your system.

Re: questions

2019-08-27 Thread Ilya Kasnacheev

; partitions. Having department as affinity key is suboptimal because 
>>>>>> there's
>>>>>> not many departments and they usually vary in size. That's the kind of
>>>>>> distribution that you want to avoid.
>>>>>>
>>>>>> Regards,
>>>>>> --
>>>>>> Ilya Kasnacheev
>>>>>>
>>>>>>
>>>>>> чт, 22 авг. 2019 г. в 18:37, narges saleh :
>>>>>>
>>>>>>> Thanks Ilya for replies.
>>>>>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>>>>>> available and the data doesn't fit the cache current ignite node? 
>>>>>>> Consider
>>>>>>> a scenario where I have 100 pods on a physical node, assuming pod = 
>>>>>>> ignite
>>>>>>> node.
>>>>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>>>>> cluster and another half to another node. If my affinity key is 
>>>>>>> department
>>>>>>> id, why can't I have department A on a partitioned cache, one partition 
>>>>>>> on
>>>>>>> one node in cluster A, and the other partition on another node on 
>>>>>>> another
>>>>>>> cluster.
>>>>>>>
>>>>>>> I might be misunderstanding the whole, and I'd appreciate
>>>>>>> clarification.
>>>>>>>
>>>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if
>>>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer
>>>>>>>> usable. Please avoid overflowing any data regions since there's no 
>>>>>>>> graceful
>>>>>>>> handling currently.
>>>>>>>> 2) I don't think so. You can't easily confine half of cache's data
>>>>>>>> to one cluster group and another half to other group.
>>>>>>>>
>>>>>>>> Such scenarios are not recommended. We expect that all partitions
>>>>>>>> have same amount of data. Not that there are a few gargantuan 
>>>>>>>> partitions
>>>>>>>> that don't fit in a single node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>>
>>>>>>>>
>>>>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>>
>>>>>>>>> I'd appreciate your answers to my questions.
>>>>>>>>>
>>>>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up
>>>>>>>>> on the same ignite node. What happens where is an overflow? Does the
>>>>>>>>> overflow data end up on a joined node? How do I keep the related data 
>>>>>>>>> from
>>>>>>>>> all the caches close to each other when the volume of exceeds a 
>>>>>>>>> single node?
>>>>>>>>>
>>>>>>>>> 2) Is there a concept of cluster affinity, meaning having a
>>>>>>>>> cluster group defined based on some affinity key? For example, if I 
>>>>>>>>> have
>>>>>>>>> two departments A and B, can I have a cluster group for department A 
>>>>>>>>> and
>>>>>>>>> another for department B?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Narges
>>>>>>>>>
>>>>>>>>

Re: questions

2019-08-23 Thread narges saleh

;>>>> available and the data doesn't fit the cache current ignite node? 
>>>>>> Consider
>>>>>> a scenario where I have 100 pods on a physical node, assuming pod = 
>>>>>> ignite
>>>>>> node.
>>>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>>>> cluster and another half to another node. If my affinity key is 
>>>>>> department
>>>>>> id, why can't I have department A on a partitioned cache, one partition 
>>>>>> on
>>>>>> one node in cluster A, and the other partition on another node on another
>>>>>> cluster.
>>>>>>
>>>>>> I might be misunderstanding the whole, and I'd appreciate
>>>>>> clarification.
>>>>>>
>>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if
>>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer
>>>>>>> usable. Please avoid overflowing any data regions since there's no 
>>>>>>> graceful
>>>>>>> handling currently.
>>>>>>> 2) I don't think so. You can't easily confine half of cache's data
>>>>>>> to one cluster group and another half to other group.
>>>>>>>
>>>>>>> Such scenarios are not recommended. We expect that all partitions
>>>>>>> have same amount of data. Not that there are a few gargantuan partitions
>>>>>>> that don't fit in a single node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> --
>>>>>>> Ilya Kasnacheev
>>>>>>>
>>>>>>>
>>>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I'd appreciate your answers to my questions.
>>>>>>>>
>>>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up
>>>>>>>> on the same ignite node. What happens where is an overflow? Does the
>>>>>>>> overflow data end up on a joined node? How do I keep the related data 
>>>>>>>> from
>>>>>>>> all the caches close to each other when the volume of exceeds a single 
>>>>>>>> node?
>>>>>>>>
>>>>>>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>>>>>>> group defined based on some affinity key? For example, if I have two
>>>>>>>> departments A and B, can I have a cluster group for department A and
>>>>>>>> another for department B?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Narges
>>>>>>>>
>>>>>>>

Re: questions

2019-08-23 Thread Ilya Kasnacheev

Hello!

I don't think that partitioning by country or city is a good idea, since
this distribution will be very uneven.

You can have different ways of minimizing network hops, while keeping
distributed nature of your database. Database isn't really distributed when
for a given city query, only one node is taking all the load and the rest
is idle.

Regards,
-- 
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 13:15, narges saleh :

> Hello Ilya,
>  I agree with you that partitioning based on month was a bad example,
> because most will be idle. Country or customer are better examples of my
> case. There are limited number of them, but they are disproportionate and
> they are always active. Let's take the country example. I need to search
> and aggregate the volume of sales in each city and by country. I have a
> couple of hundreds countries.
> Let me ask a basic question.  If my queries/aggregations are based on
> cities and countries, do I need to partition based on countries (or even
> cities)?  I want to avoid network hops for my searches and aggregations as
> much as possible (I do not slow writes either but I am aware of the trade
> off between read/writes and replication and partitioning). What do I define
> my affinity key on and what do I partition on?
>
> thanks again for your help.
>
> On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Partitioning based on let's say user id is usually fair, because there
>> usually are 100,000ths of users and neither of those owns disproportionate
>> amount of data.
>>
>> Partitioning by month is especially bad, since in a given months, all of
>> partitions will be basically idle save for one, and there would be a lot of
>> contention.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 22 авг. 2019 г. в 19:31, narges saleh :
>>
>>> I am not sure you can find real world examples where caches can be
>>> evenly partitioned, if the partitioning factor is an affinity key. I
>>> comparing, with partitioning case with relational databases, say
>>> partitioning based on month of the year. I definitely don't have 100s of
>>> departments but I do have 10s of departments, but departments are very
>>> disproportional in size.
>>> As for rebalancing case, the pods will be added to the system as the
>>> volume increases, so I'd assume that would prompt ignite to rebalance.
>>>
>>> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> 1) No. Ignite only rebalances data when nodes are joining or leaving
>>>> cluster.
>>>> 2) Ignite's affinity is not really well suited to such detailed manual
>>>> assignment. It is assumed that your cache has large number of partitions
>>>> (e.g. 1024) and data is distributed evenly between all partitions. Having
>>>> department as affinity key is suboptimal because there's not many
>>>> departments and they usually vary in size. That's the kind of distribution
>>>> that you want to avoid.
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> чт, 22 авг. 2019 г. в 18:37, narges saleh :
>>>>
>>>>> Thanks Ilya for replies.
>>>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>>>> available and the data doesn't fit the cache current ignite node? Consider
>>>>> a scenario where I have 100 pods on a physical node, assuming pod = ignite
>>>>> node.
>>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>>> cluster and another half to another node. If my affinity key is department
>>>>> id, why can't I have department A on a partitioned cache, one partition on
>>>>> one node in cluster A, and the other partition on another node on another
>>>>> cluster.
>>>>>
>>>>> I might be misunderstanding the whole, and I'd appreciate
>>>>> clarification.
>>>>>
>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if
>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer
>>>>>> usable. Please avoid overflowing any data regions since there's n

Re: questions

2019-08-23 Thread narges saleh

Hello Ilya,
 I agree with you that partitioning based on month was a bad example,
because most will be idle. Country or customer are better examples of my
case. There are limited number of them, but they are disproportionate and
they are always active. Let's take the country example. I need to search
and aggregate the volume of sales in each city and by country. I have a
couple of hundreds countries.
Let me ask a basic question.  If my queries/aggregations are based on
cities and countries, do I need to partition based on countries (or even
cities)?  I want to avoid network hops for my searches and aggregations as
much as possible (I do not slow writes either but I am aware of the trade
off between read/writes and replication and partitioning). What do I define
my affinity key on and what do I partition on?

thanks again for your help.

On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Partitioning based on let's say user id is usually fair, because there
> usually are 100,000ths of users and neither of those owns disproportionate
> amount of data.
>
> Partitioning by month is especially bad, since in a given months, all of
> partitions will be basically idle save for one, and there would be a lot of
> contention.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 22 авг. 2019 г. в 19:31, narges saleh :
>
>> I am not sure you can find real world examples where caches can be evenly
>> partitioned, if the partitioning factor is an affinity key. I comparing,
>> with partitioning case with relational databases, say partitioning based on
>> month of the year. I definitely don't have 100s of departments but I do
>> have 10s of departments, but departments are very disproportional in size.
>> As for rebalancing case, the pods will be added to the system as the
>> volume increases, so I'd assume that would prompt ignite to rebalance.
>>
>> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> 1) No. Ignite only rebalances data when nodes are joining or leaving
>>> cluster.
>>> 2) Ignite's affinity is not really well suited to such detailed manual
>>> assignment. It is assumed that your cache has large number of partitions
>>> (e.g. 1024) and data is distributed evenly between all partitions. Having
>>> department as affinity key is suboptimal because there's not many
>>> departments and they usually vary in size. That's the kind of distribution
>>> that you want to avoid.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> чт, 22 авг. 2019 г. в 18:37, narges saleh :
>>>
>>>> Thanks Ilya for replies.
>>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>>> available and the data doesn't fit the cache current ignite node? Consider
>>>> a scenario where I have 100 pods on a physical node, assuming pod = ignite
>>>> node.
>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>> cluster and another half to another node. If my affinity key is department
>>>> id, why can't I have department A on a partitioned cache, one partition on
>>>> one node in cluster A, and the other partition on another node on another
>>>> cluster.
>>>>
>>>> I might be misunderstanding the whole, and I'd appreciate clarification.
>>>>
>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> 1) When there is an overflow, either page eviction kicks in, or, if it
>>>>> is disabled, you get an IgniteOOM, after which the node is no longer
>>>>> usable. Please avoid overflowing any data regions since there's no 
>>>>> graceful
>>>>> handling currently.
>>>>> 2) I don't think so. You can't easily confine half of cache's data to
>>>>> one cluster group and another half to other group.
>>>>>
>>>>> Such scenarios are not recommended. We expect that all partitions have
>>>>> same amount of data. Not that there are a few gargantuan partitions that
>>>>> don't fit in a single node.
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I'd appreciate your answers to my questions.
>>>>>>
>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up on
>>>>>> the same ignite node. What happens where is an overflow? Does the 
>>>>>> overflow
>>>>>> data end up on a joined node? How do I keep the related data from all the
>>>>>> caches close to each other when the volume of exceeds a single node?
>>>>>>
>>>>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>>>>> group defined based on some affinity key? For example, if I have two
>>>>>> departments A and B, can I have a cluster group for department A and
>>>>>> another for department B?
>>>>>>
>>>>>> Thanks,
>>>>>> Narges
>>>>>>
>>>>>

Re: questions

2019-08-23 Thread Ilya Kasnacheev

Hello!

Partitioning based on let's say user id is usually fair, because there
usually are 100,000ths of users and neither of those owns disproportionate
amount of data.

Partitioning by month is especially bad, since in a given months, all of
partitions will be basically idle save for one, and there would be a lot of
contention.

Regards,
-- 
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh :

> I am not sure you can find real world examples where caches can be evenly
> partitioned, if the partitioning factor is an affinity key. I comparing,
> with partitioning case with relational databases, say partitioning based on
> month of the year. I definitely don't have 100s of departments but I do
> have 10s of departments, but departments are very disproportional in size.
> As for rebalancing case, the pods will be added to the system as the
> volume increases, so I'd assume that would prompt ignite to rebalance.
>
> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> 1) No. Ignite only rebalances data when nodes are joining or leaving
>> cluster.
>> 2) Ignite's affinity is not really well suited to such detailed manual
>> assignment. It is assumed that your cache has large number of partitions
>> (e.g. 1024) and data is distributed evenly between all partitions. Having
>> department as affinity key is suboptimal because there's not many
>> departments and they usually vary in size. That's the kind of distribution
>> that you want to avoid.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 22 авг. 2019 г. в 18:37, narges saleh :
>>
>>> Thanks Ilya for replies.
>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>> available and the data doesn't fit the cache current ignite node? Consider
>>> a scenario where I have 100 pods on a physical node, assuming pod = ignite
>>> node.
>>> 2)  I am not sure what you mean by confining half of cache to one
>>> cluster and another half to another node. If my affinity key is department
>>> id, why can't I have department A on a partitioned cache, one partition on
>>> one node in cluster A, and the other partition on another node on another
>>> cluster.
>>>
>>> I might be misunderstanding the whole, and I'd appreciate clarification.
>>>
>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> 1) When there is an overflow, either page eviction kicks in, or, if it
>>>> is disabled, you get an IgniteOOM, after which the node is no longer
>>>> usable. Please avoid overflowing any data regions since there's no graceful
>>>> handling currently.
>>>> 2) I don't think so. You can't easily confine half of cache's data to
>>>> one cluster group and another half to other group.
>>>>
>>>> Such scenarios are not recommended. We expect that all partitions have
>>>> same amount of data. Not that there are a few gargantuan partitions that
>>>> don't fit in a single node.
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>>>
>>>>> Hello All,
>>>>>
>>>>> I'd appreciate your answers to my questions.
>>>>>
>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up on
>>>>> the same ignite node. What happens where is an overflow? Does the overflow
>>>>> data end up on a joined node? How do I keep the related data from all the
>>>>> caches close to each other when the volume of exceeds a single node?
>>>>>
>>>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>>>> group defined based on some affinity key? For example, if I have two
>>>>> departments A and B, can I have a cluster group for department A and
>>>>> another for department B?
>>>>>
>>>>> Thanks,
>>>>> Narges
>>>>>
>>>>

Re: questions

2019-08-22 Thread narges saleh

I am not sure you can find real world examples where caches can be evenly
partitioned, if the partitioning factor is an affinity key. I comparing,
with partitioning case with relational databases, say partitioning based on
month of the year. I definitely don't have 100s of departments but I do
have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume
increases, so I'd assume that would prompt ignite to rebalance.

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1) No. Ignite only rebalances data when nodes are joining or leaving
> cluster.
> 2) Ignite's affinity is not really well suited to such detailed manual
> assignment. It is assumed that your cache has large number of partitions
> (e.g. 1024) and data is distributed evenly between all partitions. Having
> department as affinity key is suboptimal because there's not many
> departments and they usually vary in size. That's the kind of distribution
> that you want to avoid.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 22 авг. 2019 г. в 18:37, narges saleh :
>
>> Thanks Ilya for replies.
>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>> available and the data doesn't fit the cache current ignite node? Consider
>> a scenario where I have 100 pods on a physical node, assuming pod = ignite
>> node.
>> 2)  I am not sure what you mean by confining half of cache to one cluster
>> and another half to another node. If my affinity key is department id, why
>> can't I have department A on a partitioned cache, one partition on one node
>> in cluster A, and the other partition on another node on another cluster.
>>
>> I might be misunderstanding the whole, and I'd appreciate clarification.
>>
>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> 1) When there is an overflow, either page eviction kicks in, or, if it
>>> is disabled, you get an IgniteOOM, after which the node is no longer
>>> usable. Please avoid overflowing any data regions since there's no graceful
>>> handling currently.
>>> 2) I don't think so. You can't easily confine half of cache's data to
>>> one cluster group and another half to other group.
>>>
>>> Such scenarios are not recommended. We expect that all partitions have
>>> same amount of data. Not that there are a few gargantuan partitions that
>>> don't fit in a single node.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>>
>>>> Hello All,
>>>>
>>>> I'd appreciate your answers to my questions.
>>>>
>>>> 1) Assuming I use affinity key among 4 caches, and they all end up on
>>>> the same ignite node. What happens where is an overflow? Does the overflow
>>>> data end up on a joined node? How do I keep the related data from all the
>>>> caches close to each other when the volume of exceeds a single node?
>>>>
>>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>>> group defined based on some affinity key? For example, if I have two
>>>> departments A and B, can I have a cluster group for department A and
>>>> another for department B?
>>>>
>>>> Thanks,
>>>> Narges
>>>>
>>>

Re: questions

2019-08-22 Thread Ilya Kasnacheev

Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving
cluster.
2) Ignite's affinity is not really well suited to such detailed manual
assignment. It is assumed that your cache has large number of partitions
(e.g. 1024) and data is distributed evenly between all partitions. Having
department as affinity key is suboptimal because there's not many
departments and they usually vary in size. That's the kind of distribution
that you want to avoid.

Regards,
-- 
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh :

> Thanks Ilya for replies.
> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
> available and the data doesn't fit the cache current ignite node? Consider
> a scenario where I have 100 pods on a physical node, assuming pod = ignite
> node.
> 2)  I am not sure what you mean by confining half of cache to one cluster
> and another half to another node. If my affinity key is department id, why
> can't I have department A on a partitioned cache, one partition on one node
> in cluster A, and the other partition on another node on another cluster.
>
> I might be misunderstanding the whole, and I'd appreciate clarification.
>
> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> 1) When there is an overflow, either page eviction kicks in, or, if it is
>> disabled, you get an IgniteOOM, after which the node is no longer usable.
>> Please avoid overflowing any data regions since there's no graceful
>> handling currently.
>> 2) I don't think so. You can't easily confine half of cache's data to one
>> cluster group and another half to other group.
>>
>> Such scenarios are not recommended. We expect that all partitions have
>> same amount of data. Not that there are a few gargantuan partitions that
>> don't fit in a single node.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>>
>>> Hello All,
>>>
>>> I'd appreciate your answers to my questions.
>>>
>>> 1) Assuming I use affinity key among 4 caches, and they all end up on
>>> the same ignite node. What happens where is an overflow? Does the overflow
>>> data end up on a joined node? How do I keep the related data from all the
>>> caches close to each other when the volume of exceeds a single node?
>>>
>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>> group defined based on some affinity key? For example, if I have two
>>> departments A and B, can I have a cluster group for department A and
>>> another for department B?
>>>
>>> Thanks,
>>> Narges
>>>
>>

Re: questions

2019-08-22 Thread narges saleh

Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes
available and the data doesn't fit the cache current ignite node? Consider
a scenario where I have 100 pods on a physical node, assuming pod = ignite
node.
2)  I am not sure what you mean by confining half of cache to one cluster
and another half to another node. If my affinity key is department id, why
can't I have department A on a partitioned cache, one partition on one node
in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1) When there is an overflow, either page eviction kicks in, or, if it is
> disabled, you get an IgniteOOM, after which the node is no longer usable.
> Please avoid overflowing any data regions since there's no graceful
> handling currently.
> 2) I don't think so. You can't easily confine half of cache's data to one
> cluster group and another half to other group.
>
> Such scenarios are not recommended. We expect that all partitions have
> same amount of data. Not that there are a few gargantuan partitions that
> don't fit in a single node.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 20 авг. 2019 г. в 06:29, narges saleh :
>
>> Hello All,
>>
>> I'd appreciate your answers to my questions.
>>
>> 1) Assuming I use affinity key among 4 caches, and they all end up on the
>> same ignite node. What happens where is an overflow? Does the overflow data
>> end up on a joined node? How do I keep the related data from all the caches
>> close to each other when the volume of exceeds a single node?
>>
>> 2) Is there a concept of cluster affinity, meaning having a cluster group
>> defined based on some affinity key? For example, if I have two departments
>> A and B, can I have a cluster group for department A and another for
>> department B?
>>
>> Thanks,
>> Narges
>>
>

questions

2019-08-19 Thread narges saleh

Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the
same ignite node. What happens where is an overflow? Does the overflow data
end up on a joined node? How do I keep the related data from all the caches
close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group
defined based on some affinity key? For example, if I have two departments
A and B, can I have a cluster group for department A and another for
department B?

Thanks,
Narges

Re: Ignite Client Affinity Questions

2019-07-26 Thread Denis Magda

Folks, let me copy and paste a section from the new docs GridGain is
working on and will launch soon. That section compares thick vs thin
clients. Hopefully, it clarifies a lot.

*Ignite/GridGain clients come in several different flavors, each with
various capabilities. Thick and thin clients go beyond SQL capabilities and*
*support many more APIs. Finally, ORM frameworks like Spring Data or
Hibernate are also integrated with GridGain and*
*can be used as an access point to your cluster.*

*Let's review the difference between thick and thin clients by comparing
their capabilities.*

**Thick* clients (aka. standard clients) join the cluster via an internal
protocol, receive all of the cluster-wide*
*updates such as topology changes, are aware of data distribution, and can
direct a query/operation to a server node*
*that owns a required data set. Plus, thick clients support all of the
Ignite and GridGain APIs.*

**Thin* clients (aka. lightweight clients) connect to the cluster via a
public TCP/IP protocol with a well-defined*
*message format. This type of client supports a limited set of APIs
(presently, key-value and SQL operations only) but*
*in return:*

*- Makes it easy to enable programming language support for GridGain and
Ignite. Java, .NET, C++, Python, Node.JS, and*
*  PHP are supported out of the box.*

*- Doesn't have any dependencies on JVM. For instance, .NET and C++ _thick_
clients have a richer feature set but*
*  start and use JVM internally.*

*- Requires at least one port opened on the cluster end. Note, that more
ports need to be opened if*
*  partition-awareness is used for a thin client.*

--
Denis Magda

On Fri, Jul 26, 2019 at 6:21 AM Igor Belyakov 
wrote:

> Hi Nick,
>
> Yes, client node has up to date information regarding affinity and will
> use it to send request to the proper server node.
>
> If you're comparing client node to thin client then client node has more
> overhead since it's starting cluster node and processes discovery and
> communication events in the cluster, due to that it has information
> regarding affinity. On the other hand, the thin client sends all requests
> to the same server node and after that the request will be redirected to
> other nodes if it's necessary.
>
> Regards,
> Igor
>
>
>
> On Thu, Jul 25, 2019 at 6:37 PM milkywayz  wrote:
>
>> Hello, I plan on using an Ignite node topology as follows:
>> 1) Application configured as an Ignite Client that handles distributing
>> requests to the correct server node.
>> 2) 3+ Partitioned Server nodes that handle cache storage and processing of
>> requests.
>> 3) Each server node has two caches which use AffinityKey for pinning
>> entries
>> in the two caches to one node for a given key.
>>
>> Questions:
>> 1) Will the client have up to date access to the AffinityFunction for the
>> two caches?
>> 2) What sort of overhead is associated with running as client mode? I want
>> to make it as dumb as possible, so it would only just be aware of topology
>> changes and always have the latest AffinityFunction for the partitioned
>> server node topology.
>>
>> Thank you, Nick.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Ignite Client Affinity Questions

2019-07-25 Thread milkywayz

Hello, I plan on using an Ignite node topology as follows:
1) Application configured as an Ignite Client that handles distributing
requests to the correct server node.
2) 3+ Partitioned Server nodes that handle cache storage and processing of
requests. 
3) Each server node has two caches which use AffinityKey for pinning entries
in the two caches to one node for a given key.

Questions:
1) Will the client have up to date access to the AffinityFunction for the
two caches?
2) What sort of overhead is associated with running as client mode? I want
to make it as dumb as possible, so it would only just be aware of topology
changes and always have the latest AffinityFunction for the partitioned
server node topology.

Thank you, Nick.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Questions on IgniteDataStreamer

2019-06-11 Thread Ilya Kasnacheev

Hello!

Can you please at least share the exceptions you are getting?

Regards,
-- 
Ilya Kasnacheev


сб, 1 июн. 2019 г. в 13:55, Om Thacker :

> Hello vbm,
>
> I am working on the exact same problem. Did you find the solution for the
> same.
> I am using following code in my client application which will listen to
> kafka connect (confluent).
>
> I have one to one mapping for kafka topic and ignite cache. When there is
> an
> insert into db, the kafka listener listens that and using gson library i am
> converting json to object and the stmr.addData() works fine. But while
> updating the value in db, i am facing marshller error.I tried to use
> cache.put() method ,but it gives me cachewriteexception .
>
>
> @KafkaListener(topics = { "kafka-Users" })
> public void listenUsers(String message) {
> logger.error(message);
> ObjectMapper mapper = new ObjectMapper();
> JsonNode rootNode;
> try {
> rootNode = mapper.readTree(message);
> Users user = new Users();
> IgniteDataStreamer stmr =
> ignite.dataStreamer(IgniteProperties.USERS_CACHE.getName());
> //  stmr.allowOverwrite(true);
>
> /*
>  * stmr.receiver(new StreamTransformer Users>() {
>  *
>  * @Override public Object
> process(MutableEntry entry,
> Object...
>  * arguments) throws EntryProcessorException {
> return null; }
>  *
>  * });
>  */
>
> /*
>  * stmr.receiver(StreamTransformer.from((e, arg)
> -> { Users val =
> e.getValue();
>  * System.out.println(val+" user from reciever
> $"); return null;
> }));
>  */
>
> Gson gson = new
>
> GsonBuilder().setFieldNamingPolicy(FieldNamingPolicy.UPPER_CAMEL_CASE).create();
> user =
> gson.fromJson(rootNode.get("payload").toString(), Users.class);
>
> stmr.addData(rootNode.get("payload").get("UsersKey").asLong(), user);
> stmr.flush(); //
> //  stmr.allowOverwrite(true);
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
>
>
>
> can you please share your solution for the same.
> Thanks,
> Om Thacker
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Questions on IgniteDataStreamer

2019-06-01 Thread Om Thacker

Hello vbm,

I am working on the exact same problem. Did you find the solution for the
same.
I am using following code in my client application which will listen to
kafka connect (confluent).

I have one to one mapping for kafka topic and ignite cache. When there is an
insert into db, the kafka listener listens that and using gson library i am
converting json to object and the stmr.addData() works fine. But while
updating the value in db, i am facing marshller error.I tried to use
cache.put() method ,but it gives me cachewriteexception .


@KafkaListener(topics = { "kafka-Users" })
public void listenUsers(String message) {
logger.error(message);
ObjectMapper mapper = new ObjectMapper();
JsonNode rootNode;
try {
rootNode = mapper.readTree(message);
Users user = new Users();
IgniteDataStreamer stmr =
ignite.dataStreamer(IgniteProperties.USERS_CACHE.getName());
//  stmr.allowOverwrite(true);

/*
 * stmr.receiver(new StreamTransformer() {
 * 
 * @Override public Object process(MutableEntry entry,
Object...
 * arguments) throws EntryProcessorException { return 
null; }
 * 
 * });
 */

/*
 * stmr.receiver(StreamTransformer.from((e, arg) -> { 
Users val =
e.getValue();
 * System.out.println(val+" user from reciever 
$"); return null;
}));
 */

Gson gson = new
GsonBuilder().setFieldNamingPolicy(FieldNamingPolicy.UPPER_CAMEL_CASE).create();
user = 
gson.fromJson(rootNode.get("payload").toString(), Users.class);

stmr.addData(rootNode.get("payload").get("UsersKey").asLong(), user);
stmr.flush(); //
//  stmr.allowOverwrite(true);
} catch (Exception e) {
e.printStackTrace();
}
}




can you please share your solution for the same.
Thanks,
Om Thacker



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: waiting for partition map exchange questions

2019-04-25 Thread Scott Feldstein

Hi Andrei, have you had a chance to check out these logs?

Thanks

On Sun, Apr 14, 2019 at 17:18 scottmf  wrote:

> ignite.tgz
> 
> (attached file)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: waiting for partition map exchange questions

2019-04-14 Thread scottmf

ignite.tgz
  
(attached file)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: waiting for partition map exchange questions

2019-04-14 Thread scottmf

thanks Andrei,

I've attached the files.  The outage occurred at approximately
2019-04-08T19:43Z

the host ending in 958sw is the host that went down at the start of the
outage.  Host ending in dldh2 came up after 958sw went down.

hwgpf and zq8j8 were up the entire time.

These are the server nodes.  Let me know if you need the client node logs or
want any metric data.

Scott



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: waiting for partition map exchange questions

2019-04-12 Thread aealexsandrov

Hi,

Yes without logs it's not easy to understand what is the reason. Could you
please attach them?

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

waiting for partition map exchange questions

2019-04-09 Thread scottmf

hi All,
I just encountered a situation in my k8s cluster where I'm running a 3 node
ignite setup, with 2 client nodes.  The server nodes have 8GB of off-heap
per node, 8GB JVM (with g1gc) and 4GB of OS memory without persistence.  I'm
using Ignite 2.7.

One of the ignite nodes got killed due to some issue in the cluster.  I
believe this was the sequence of events:

-> Data Eviction spikes on two nodes in the cluster (NODE A & B), then 15
mins later..
-> NODE C goes down
-> NODE D comes up (to replace node C)
--> NODE D attempts a PME
--> NODE B log = "Local node has detected failed nodes and started
cluster-wide procedure"
--> During PME the Ignite JVM on NODE D is restarted since it was taking too
long and was killed by a k8s liveness probe.
--> NODE D comes back up and attempts another PME
---> Note: i see these messages from all the nodes "First 10 pending
exchange futures [total=2]"  The total keeps ascending.  The highest number
I see is total=14.
---> NODE D log = "Failed to wait for initial partition map exchange.
Possible reasons are:..."
---> NODE B log = "Possible starvation in striped pool. queue=[], dealock =
false, Completed: 991189487 ..."
---> NODE A log = "Client node considered as unreachable and will be dropped
from cluster, because no metrics update messages received in interval:
TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
network problems or long GC pause on client node, try to increase this
parameter. [nodeId=c5a92006-c29a-4a37-b149-7ec7855dc401,
clientFailureDetectionTimeout=3]"

NOTE that NODE D kept restarting due to a k8s liveness probe.  I think I'm
going to remove the probe or make it much more relaxed.

During this time the ignite cluster is completely frozen.  Restarting NODE D
and replacing it with NODE E did not solve the issue.  The only way I could
solve the problem is to restart NODE B.  Any idea why this could have
occurred or what I can do to prevent it in the future?

I do see this from the failureHandler: "FailureContext [type=CRITICAL_ERROR,
err=class org.apache.ignite.IgniteException: Failed to create string
representation of binary object.]" but not sure if this is something that
would have caused the cluster to seize up.

Overall nodes go down in this environment and come back all the time without
issues.  But I've seen problem occur twice in the last few months.

I have logs & thread dumps for all the nodes in the system so if you want me
to check anything in particular let me know.

thanks,
Scott



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Memory related questions

2019-02-03 Thread Karun Chand

Hi Tomislav,

1) When you say you want to increase the heap memory to Ignite node, do you
mean off-heap or on-heap?

To increase the off-heap memory, please make sure to set these parameters
explicitly in your ignite server xml configuration file under the
DataRegionConfiguration that you are currently using -

To increase on-heap memory (basically the JVM heap size), you provide the
-Xms and -Xmx java options as you showed. It is completely okay to have the
maxSize parameter above to have a much larger value than the -Xmx parameter.

When using Docker, there is one thing you need to be careful about - the
JVM may not necessarily limit itself to the amount of memory allocated to
the Docker container if you don't change the default parameters.
There are 2 options to deal with this -
a) In the docker run command, set the --memory parameter to the amount of
memory you want to give the container AND make sure -Xmx parameter is set
for the JVM to comply with the value you set for the --memory parameter of
docker run.
b) The above leaves you wanting to change the memory values in two places
everytime you want to change it. So JDK 9 (also backported to JDK 8)
introduced a new way for the JVM to discover the value that has been set
for the docker container.

-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=1

When you use the above options for the JVM, the JVM will comply with
whatever value has been set for the container in the --memory parameter.

2) When you execute a distributed query, the retrieved data is on-heap.
Change your -Xmx JVM parameter to fix this.

3) Depends on your usecase. Use the information provided above to set it as
per your needs.

Regards,
RH
https://www.apacheignitetutorial.com/

On Fri, Feb 1, 2019 at 11:33 AM newigniter 
wrote:

> Greetings all.
> I have a few questions regarding memory.
>
> 1.) I am running ignite inside docker container. How can I increase heap
> memory to ignite node when started?
> I tried with passing -Xmx (I pass it to docker run command like this: -e
> "JAVA_OPTS=-Xmx3g")
> parameter but when node starts it always says that heap memory is 1gb?
>
> 2.) When I execute some query(e.g. select * from table), where does ignite
> stores the data retrieved? off heap or on heap memory? Each time I perform
> some query which should load some data to memory(not even that much, 2
> rows but with about 30 columns) ignite node fails due to insufficient
> memory. I presume heap memory is the problem but I don't understand how I
> can change it.
>
> 3.) How much heap memory should I assign to my node? I have 32 gb machine.
> Currently I assigned 16gb data region to ignite node. Heap memory is 1gb,
> default one.
>
> Appreciate the help.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Memory related questions

2019-02-01 Thread newigniter

Greetings all.
I have a few questions regarding memory.

1.) I am running ignite inside docker container. How can I increase heap
memory to ignite node when started?
I tried with passing -Xmx (I pass it to docker run command like this: -e
"JAVA_OPTS=-Xmx3g")
parameter but when node starts it always says that heap memory is 1gb?

2.) When I execute some query(e.g. select * from table), where does ignite
stores the data retrieved? off heap or on heap memory? Each time I perform
some query which should load some data to memory(not even that much, 2
rows but with about 30 columns) ignite node fails due to insufficient
memory. I presume heap memory is the problem but I don't understand how I
can change it.

3.) How much heap memory should I assign to my node? I have 32 gb machine.
Currently I assigned 16gb data region to ignite node. Heap memory is 1gb,
default one.

Appreciate the help.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Some questions about store and distributed processing

2019-01-28 Thread Ilya Kasnacheev

Hello!

You can also use Continuous Queries for that. They will execute where the
data is modified, when it is modified.
https://apacheignite.readme.io/docs/continuous-queries

Regards,
-- 
Ilya Kasnacheev


сб, 26 янв. 2019 г. в 13:41, yann Blazart :

> Hello all !
>
> I will have to use Ignite because I think it's the best solution to all my
> concerns, but I have a few question.
>
> I have to process very big json files (200GB), with lot of objects of
> different type generated from it.
> These objects, I will have to do multiple controls on it (with a dsl), and
> check  unicity, and in the end do some complexe join request between them.
>
> So for complexe request, ok, I store all in partitionned tables, and do
> request, easy.
>
> But for all the dsl rule ton apply on each object, It could be very nice
> if it can be applied when it's stored on the node, instead of doing it when
> I read the file, I mean :
>
> cache.putAll(mymap);
>
> then something on node to say :  new Entry listener -> execute dsl rules.
>
> I think I can gain lot of processing time like that. But is it possible ?
>
> I checked the doc, but I only see ways to 1st store all then run dsl rule
> on all node.
>
> Thanks in advance ;)
>
> Regards
>

Re: Some questions about store and distributed processing

2019-01-26 Thread Karun Chand

Hi Yann,

Event listeners in Ignite can be helpful to you -
https://apacheignite.readme.io/docs/events
You can listen for specific types of events (like cache put, cache read) on
a local node or remote events on a group of cluster nodes and then perform
whatever actions you want.
All the different event types that exist for Ignite version 2.7 can be seen
here -
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html
EVT_CACHE_OBJECT_PUT is probably what you are looking for.

Regards,
RH
https://www.apacheignitetutorial.com/

On Sat, Jan 26, 2019 at 2:41 AM yann Blazart  wrote:

> Hello all !
>
> I will have to use Ignite because I think it's the best solution to all my
> concerns, but I have a few question.
>
> I have to process very big json files (200GB), with lot of objects of
> different type generated from it.
> These objects, I will have to do multiple controls on it (with a dsl), and
> check  unicity, and in the end do some complexe join request between them.
>
> So for complexe request, ok, I store all in partitionned tables, and do
> request, easy.
>
> But for all the dsl rule ton apply on each object, It could be very nice
> if it can be applied when it's stored on the node, instead of doing it when
> I read the file, I mean :
>
> cache.putAll(mymap);
>
> then something on node to say :  new Entry listener -> execute dsl rules.
>
> I think I can gain lot of processing time like that. But is it possible ?
>
> I checked the doc, but I only see ways to 1st store all then run dsl rule
> on all node.
>
> Thanks in advance ;)
>
> Regards
>

Some questions about store and distributed processing

2019-01-26 Thread yann Blazart

Hello all !

I will have to use Ignite because I think it's the best solution to all my
concerns, but I have a few question.

I have to process very big json files (200GB), with lot of objects of
different type generated from it.
These objects, I will have to do multiple controls on it (with a dsl), and
check  unicity, and in the end do some complexe join request between them.

So for complexe request, ok, I store all in partitionned tables, and do
request, easy.

But for all the dsl rule ton apply on each object, It could be very nice if
it can be applied when it's stored on the node, instead of doing it when I
read the file, I mean :

cache.putAll(mymap);

then something on node to say :  new Entry listener -> execute dsl rules.

I think I can gain lot of processing time like that. But is it possible ?

I checked the doc, but I only see ways to 1st store all then run dsl rule
on all node.

Thanks in advance ;)

Regards

part 3 - dead in the water now - notes and questions on configuration and installation of web console for ignite

2019-01-18 Thread Scott Cote

Part 3 of notes and questions about web console for ignite

Restarted attempt inside BDD
(https://apacheignite-tools.readme.io/docs/build-and-deploy) of the section
titled:

Run Ignite Web Console In Development Mode

https://apacheignite-tools.readme.io/docs/build-and-deploy#section-run-ignite-web-console-in-development-mode

=
Items
=

1. I downgraded mongodb to 3.4.18 (removed the mongod service corresponding
to 4.0 and added 3.4.x to the path instead. Restarted)
2. Reran npm install with new mongodb in place.
3. Attemped to start npm in the backend.
* - observed mongoose connect to the mongod server (console response on
the mongod terminal)
* Saw the "start of the npm" bomb out I provide in this email both
console output (from npm) as well as the log file listed in the output

So how do I proceed?

About my environment:

Java
C:\cygwin64\home\scote\ignite\modules\web-console\backend>java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)

Ignite r
src of 2.7

Mongo
3.4.18

Npm
Node-v11.6.0-x64.msi

Thanks in advance.

SCott

=
Console output
=
C:\cygwin64\home\scote\ignite\modules\web-console\backend>npm start

> ignite-web-console@2.7.0 start
> C:\cygwin64\home\scote\ignite\modules\web-console\backend
> node ./index.js

fireUp# INFO Registered module at 'app\agentSocket.js' as implementation for
interface 'agent-socket'.
fireUp# INFO Registered module at 'app\agentsHandler.js' as implementation for
interface 'agents-handler'.
fireUp# INFO Registered module at 'app\apiServer.js' as implementation for
interface 'api-server'.
fireUp# INFO Registered module at 'app\browsersHandler.js' as implementation
for interface 'browsers-handler'.
fireUp# INFO Registered module at 'app\configure.js' as implementation for
interface 'configure'.
fireUp# INFO Registered module at 'app\mongo.js' as implementation for
interface 'mongo'.
fireUp# INFO Registered module at 'app\mongoose.js' as implementation for
interface 'mongoose'.
fireUp# INFO Registered module at 'app\nconf.js' as implementation for
interface 'nconf'.
fireUp# INFO Registered module at 'app\routes.js' as implementation for
interface 'routes'.
fireUp# INFO Registered module at 'app\schemas.js' as implementation for
interface 'schemas'.
fireUp# INFO Registered module at 'app\settings.js' as implementation for
interface 'settings'.
fireUp# INFO Registered module at 'errors\index.js' as implementation for
interface 'errors'.
fireUp# INFO Registered module at 'middlewares\api.js' as implementation for
interface 'middlewares:api'.
fireUp# INFO Registered module at 'middlewares\demo.js' as implementation for
interface 'middlewares:demo'.
fireUp# INFO Registered module at 'middlewares\host.js' as implementation for
interface 'middlewares:host'.
fireUp# INFO Registered module at 'middlewares\user.js' as implementation for
interface 'middlewares:user'.
fireUp# INFO Registered module at 'routes\activities.js' as implementation for
interface 'routes/activities'.
fireUp# INFO Registered module at 'routes\admin.js' as implementation for
interface 'routes/admin'.
fireUp# INFO Registered module at 'routes\caches.js' as implementation for
interface 'routes/caches'.
fireUp# INFO Registered module at 'routes\clusters.js' as implementation for
interface 'routes/clusters'.
fireUp# INFO Registered module at 'routes\configuration.js' as implementation
for interface 'routes/configurations'.
fireUp# INFO Registered module at 'routes\demo.js' as implementation for
interface 'routes/demo'.
fireUp# INFO Registered module at 'routes\domains.js' as implementation for
interface 'routes/domains'.
fireUp# INFO Registered module at 'routes\downloads.js' as implementation for
interface 'routes/downloads'.
fireUp# INFO Registered module at 'routes\igfss.js' as implementation for
interface 'routes/igfss'.
fireUp# INFO Registered module at 'routes\notebooks.js' as implementation for
interface 'routes/notebooks'.
fireUp# INFO Registered module at 'routes\profile.js' as implementation for
interface 'routes/profiles'.
fireUp# INFO Registered module at 'routes\public.js' as implementation for
interface 'routes/public'.
fireUp# INFO Registered module at 'services\Utils.js' as implementation for
interface 'services/utils'.
fireUp# INFO Registered module at 'services\activities.js' as implementation
for interface 'services/activities'.
fireUp# INFO Registered module at 'services\auth.js' as implementation for
interface 'services/auth'.
fireUp# INFO Registered module at 'services\caches.js' as implementation for
interface 'services/caches'.
fireUp# INFO Registered module at 'services\clusters.js' as implementation for
interface 'services/clusters'.
fireUp# INFO Registered module at 'services\configurations.js' as
implementation for interfac

part 2 - notes and questions on configuration and installation of web console for ignite

2019-01-18 Thread Scott Cote

Part 2 of notes and questions about web console for ignite
=
Items
=

Item 1.  Found more npm issues while performing "Run Ignite Web Console In 
Development Mode" of BDD referenced in Part 1

Item 2. <<>>>  It seems that the latest version of community 
mongodb is NOT supported by Ignite.  Maybe a mongoose problem (see item from 
part 1 about upgrading mongoose)???

In the meantime, I'll downgrade mongodb to 3.4.x 

=
Details
=

Detail 1 -> 1

c:\cygwin64\home\scote\ignite\modules\web-console\backend>npm install 
--no-optional
added 31 packages from 292 contributors and audited 6667 packages in 4.948s
found 10 vulnerabilities (6 low, 3 high, 1 critical)
  run `npm audit fix` to fix them, or `npm audit` for details

c:\cygwin64\home\scote\ignite\modules\web-console\backend>

c:\cygwin64\home\scote\ignite\modules\web-console\backend>npm audit fix
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.7 
(node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for 
fsevents@1.2.7: wanted {"os":"darwin","arch":"any"} (current: 
{"os":"win32","arch":"x64"})

up to date in 4.475s
fixed 0 of 10 vulnerabilities in 6667 scanned packages
  3 vulnerabilities required manual review and could not be updated
  3 package updates for 7 vulns involved breaking changes
  (use `npm audit fix --force` to install breaking changes; or refer to `npm 
audit` for steps to fix these manually)

c:\cygwin64\home\scote\ignite\modules\web-console\backend>

Detail 2 -> 2

c:\cygwin64\home\scote\ignite\modules\web-console\backend>npm start

> ignite-web-console@2.7.0 start 
> c:\cygwin64\home\scote\ignite\modules\web-console\backend
> node ./index.js

fireUp# INFO  Registered module at 'app\agentSocket.js' as implementation for 
interface 'agent-socket'.
fireUp# INFO  Registered module at 'app\agentsHandler.js' as implementation for 
interface 'agents-handler'.
fireUp# INFO  Registered module at 'app\apiServer.js' as implementation for 
interface 'api-server'.
fireUp# INFO  Registered module at 'app\browsersHandler.js' as implementation 
for interface 'browsers-handler'.
fireUp# INFO  Registered module at 'app\configure.js' as implementation for 
interface 'configure'.
fireUp# INFO  Registered module at 'app\mongo.js' as implementation for 
interface 'mongo'.
fireUp# INFO  Registered module at 'app\mongoose.js' as implementation for 
interface 'mongoose'.
fireUp# INFO  Registered module at 'app\nconf.js' as implementation for 
interface 'nconf'.
fireUp# INFO  Registered module at 'app\routes.js' as implementation for 
interface 'routes'.
fireUp# INFO  Registered module at 'app\schemas.js' as implementation for 
interface 'schemas'.
fireUp# INFO  Registered module at 'app\settings.js' as implementation for 
interface 'settings'.
fireUp# INFO  Registered module at 'errors\index.js' as implementation for 
interface 'errors'.
fireUp# INFO  Registered module at 'middlewares\api.js' as implementation for 
interface 'middlewares:api'.
fireUp# INFO  Registered module at 'middlewares\demo.js' as implementation for 
interface 'middlewares:demo'.
fireUp# INFO  Registered module at 'middlewares\host.js' as implementation for 
interface 'middlewares:host'.
fireUp# INFO  Registered module at 'middlewares\user.js' as implementation for 
interface 'middlewares:user'.
fireUp# INFO  Registered module at 'routes\activities.js' as implementation for 
interface 'routes/activities'.
fireUp# INFO  Registered module at 'routes\admin.js' as implementation for 
interface 'routes/admin'.
fireUp# INFO  Registered module at 'routes\caches.js' as implementation for 
interface 'routes/caches'.
fireUp# INFO  Registered module at 'routes\clusters.js' as implementation for 
interface 'routes/clusters'.
fireUp# INFO  Registered module at 'routes\configuration.js' as implementation 
for interface 'routes/configurations'.
fireUp# INFO  Registered module at 'routes\demo.js' as implementation for 
interface 'routes/demo'.
fireUp# INFO  Registered module at 'routes\domains.js' as implementation for 
interface 'routes/domains'.
fireUp# INFO  Registered module at 'routes\downloads.js' as implementation for 
interface 'routes/downloads'.
fireUp# INFO  Registered module at 'routes\igfss.js' as implementation for 
interface 'routes/igfss'.
fireUp# INFO  Registered module at 'routes\notebooks.js' as implementation for 
interface 'routes/notebooks'.
fireUp# INFO  Registered module at 'routes\profile.js' as implementation for 
interface 'routes/profiles'.
fireUp# INFO  Registered module at 'routes\public.js' as implementation for 
interface 'routes/public'.
fireUp# INFO  Registered module at 'services\Utils.js' as implementation for 
interface 'services/utils'.
fireUp# INFO  Registered module at 'services\activities.js' as implementation 
for interface 'services/activities'.
fireUp# INFO

notes and questions on configuration and installation of web console for ignite

2019-01-18 Thread Scott Cote

Am going through the manual installation and implementation of the Ignite Web 
Console.
This is Part 1 of a series of notes that I’m making….

Throughout this set of items (questions and notes), I’m referencing the “Build 
and Deploy“ document (BDD) 
https://apacheignite-tools.readme.io/docs/build-and-deploy
=
Items
=

Item 1:
In the prerequisites section of BDD, we are instructed to run npm from 
$IGNITE_HOME.   Is this the ignite home of the exploded source tree, or the 
ignite home of the unzipped/extracted binary (released) instances ( - for 
example, I downloaded a binary and unzipped it/exploded the tar/gz).

Currently, I’m running npm from the exploded source tree and NOT my exploded 
binary – which is what my env variable $IGNITE_HOME points to.


Item 2:
The machine that I need to deploy the web console into is sitting behind a very 
grandiose firewall/av setup.  Using GIT/Maven/NPM to pull in dependencies for a 
build on that machine is not supportable.   I am able to build somewhere else 
….  Want to package the outcome and deploy it to the super secure machine.   
Maybe create a docker container….   Is there a docker container with web 
console already configured?   If not, and if I’m allowed, how do I contribute a 
docker container of this setup?  I think I can sell to my management that more 
eyeballs on a crafted docker container – generic without any of our proprietary 
work – would be good over all.  We would all benefit.

Item 3:
While running the npm installer for the backend (prerequisites of BDD), I 
noticed desupport notices from:

  *   Mockgoose
  *   Simple-bufferstream
  *   Babel
  *   Minimatch
  *   Circular-json
  *   Cryptiles
  *   Boom
  *   Hoek
  *
I will include the npm output below as Detail 1 -> 3  (notation: 1 refers to 
the first detail – 1, and 3 refers to this item of concern)

Item 4:
Npm audit revealed a couple of critical warnings (among others).   So that I 
can address my security team accurately (considering this IS an open source 
project)  Are the sources of the warnings (listed in Detail 2 -> 4) on an 
immediate roadmap to be corrected in the next release of Ignite.

Can I fix in my install by running “npm audit fix” ?  I’m not a nodejs guy, so 
I don’t know if the “fix” could be backported to the source and then given back 
to ignite community.  I will run npm fix, just don’t know if I can give outcome 
back.

Item 5:
Ran the audit fix for backend of BDD. See 3 -> 5 for the outcome on the screen.

Item 6:
While running the npm installer for the frontend (prerequisites of BDD), I 
noticed desupport and problem notices from:

  *   samsam
  *   text-encoding
  *   circular-json
  *   browserslist
  *   node-uuid
  *   hoek
  *   cryptiles
  *   boom
  *   socks
  *   mailcomposer
  *   buildmail
  *   uws

I will include the npm output below as Detail 4 -> 6

Item 7:
Again - Npm audit revealed a couple of critical warnings (among others).   So 
that I can address my security team accurately (considering this IS an open 
source project)  Are the sources of the warnings (listed in Detail 5 -> 7) on 
an immediate roadmap to be corrected in the next release of Ignite.

Can I fix in my install by running “npm audit fix” ?  I’m not a nodejs guy, so 
I don’t know if the “fix” could be backported to the source and then given back 
to ignite community.  I will run npm fix, just don’t know if I can give outcome 
back.


Item 8:
Ran the audit fix for backend of BDD. See 6 -> 8 for the outcome on the screen.


===
Details
===
Detail 1-> 3

c:\cygwin64\home\scote\ignite\modules\web-console\backend>npm install 
--no-optional
npm WARN deprecated mockgoose@6.0.8: Mockgoose is no longer actively 
maintained, consider using mongodb-memory-server
npm WARN deprecated scmp@1.0.2: scmp v2 uses improved core crypto comparison 
since Node v6.6.0
npm WARN deprecated simple-bufferstream@1.0.0: no longer maintained
npm WARN deprecated babel-preset-latest@6.24.1: We're super   excited that 
you're trying to use ES2017+ syntax, but instead of making more yearly presets 
 , Babel now has a better preset that we recommend you use instead: npm 
install babel-preset-env --save-dev. preset-env without options will compile 
ES2015+ down to ES5 just like using all the presets together and thus is more 
future proof. It also allows you to target specific browsers so that Babel can 
do less work and you can ship native ES2015+ to user  ! We are also in the 
process of releasing v7, so please give 
http://babeljs.io/blog/2017/09/12/planning-for-7.0 a read and help test it out 
in beta! Thanks so much for using Babel , please give us a follow on Twitter 
@babeljs for news on Babel, join slack.babeljs.io for discussion/development 
and help support the project at opencollective.com/babel
npm WARN deprecated babel-preset-es2017@6.24.1:   Thanks for using Babel: we 
recommend using babel-preset-env now: please read babeljs.io/env to update!
npm WARN

Re: ignite questions

2019-01-04 Thread Clay Teahouse

Thanks for everyone's feedback regarding the capacity planning question. My
main objective here is to size my servers accordingly and keep the related
data on the same server, as much as possible. It seems a custom affinity
function that takes into account the server classes (defined based on
capacity) is the potential solution as suggested. Naveen -- I will keep in
mind your suggestion as well, regarding partitioning the huge data sets.
thanks.

Any feedback regarding my other questions:
Data pin to cache: How do I make sure certain data never gets evicted (with
native persistence enabled)? For example, I want my dimension data to
always stay in cache.
How do I implement service pipelining in apache ignite? Would continuous
query be the mechanism? Any examples?
Streaming: Are there examples on how to define watermarks, i.e., input
completeness with regard to the event timestamp?

On Fri, Jan 4, 2019 at 3:33 AM Naveen  wrote:

> Regarding your question on capacity planning
>
> Not sure we have any work around to have data equally getting distributed
> to
> fulfill your requirement technically and do the sizing. But
> non-technically,
> you can change your design to include state as well as part of your
> affinity
> key along with the country and rest of the fields you have, so that data
> pertaining to India can be segregated into chunks of data which is state
> wise and if you see statewide data is definitely not so huge compared to
> country as a whole and same time relevant data is also getting stored on a
> single node. Your design also  should permit this to keep state name as
> part
> of your affinity key to resolve this use case.
>
>
> Thanks
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: ignite questions

2019-01-04 Thread Naveen

Regarding your question on capacity planning

Not sure we have any work around to have data equally getting distributed to
fulfill your requirement technically and do the sizing. But non-technically,
you can change your design to include state as well as part of your affinity
key along with the country and rest of the fields you have, so that data
pertaining to India can be segregated into chunks of data which is state
wise and if you see statewide data is definitely not so huge compared to
country as a whole and same time relevant data is also getting stored on a
single node. Your design also  should permit this to keep state name as part
of your affinity key to resolve this use case. 


Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ignite questions

2019-01-02 Thread Denis Magda

Yes, a custom affinity function is what you need to control entries
distribution across physical machines. It's feasible to do. Worked with one
of Ignite customers who did something similar for their needs - the code is
not open sourced.

--
Denis

On Wed, Jan 2, 2019 at 10:17 AM Mikael  wrote:

> Hi!
>
> By default you cannot assign a specific affinity key to a specific node
> but I think that could be done with a custom affinity function, you can do
> pretty much whatever you want with that, for example set an attribute in
> the XML file and use that to match with a specific affinity key value, so a
> node with attribute x will be assigned all affinity keys with value y.
>
> I never tried it but I do not see any reason why it would not work.
>
> Mikael
>
>
> Den 2019-01-02 kl. 17:13, skrev Clay Teahouse:
>
> Thanks Mikael.
>
> I did come across that link before, but I am not sure it addresses my
> concern. I want to see how I need I size my physical VMs based on affinity
> keys. How would I say for India affinity key use this super size VM and for
> others use the other smaller ones, so the data doesn't get shuffled around?
> Maybe, there is no way, and I just have to wait for ignite to rebalance the
> partitions and fit things where they should be based on the affinity key.
>
> On Wed, Jan 2, 2019 at 8:32 AM Mikael  wrote:
>
>> You can find some information about capacity planning here:
>>
>> https://apacheignite.readme.io/docs/capacity-planning
>>
>> About your India example you can use affinity keys to keep data together
>> in groups to avoid network traffic.
>>
>> https://apacheignite.readme.io/docs/affinity-collocation
>>
>> Mikael
>> Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:
>>
>> Thanks Naveen.
>>
>> -- Cache Groups: When would I start considering cache groups, if my
>> system is growing, and sooner or later I will have to add to my caches and
>> I need to know 1) should I starting grouping now (I'd think yes), 2) if no,
>> when, what number of caches?
>> -- Capacity Planning: So, there is no guidelines on how to size the nodes
>> and the physical storage nodes reside on? How do I make sure all the
>> related data fit the same VM? It can't be the case that I have to come up
>> with 100s of super size VMs just because I have one instance with a huge
>> set of entries. For example, if I have millions of entries for India and
>> only a few for other countries, how do I make sure all the India related
>> data fits the same VM (to avoid the network) and have the data for all the
>> small countries fit on the same VM?
>> -- Pinning the data to cache: the data pinned to on-heap cache does not
>> get evicted from the memory? I want to see if there is something similar to
>> Oracle's memory pinning.
>> -- Read through: How do I know if something on cache or disk (using
>> native persistence)?
>> 5) Service chaining: Is there an example of service chaining that you can
>> point me to?
>>
>> 6) How do I implement service pipelining in apache ignite? Would
>> continuous query be the mechanism? Any examples?
>>
>> 7) Streaming: Are there examples on how to define watermarks, i.e., input
>> completeness with regard to the event timestamp?
>>
>> thank you
>> Clay
>>
>> On Tue, Jan 1, 2019 at 11:29 PM Naveen  wrote:
>>
>>> Hello
>>> Couple of things I would like to with my experience
>>>
>>> 1. Cache Groups : Around 100 caches, I do not think we need to go for
>>> Cache
>>> groups, as you mentioned cache groups will have impact on you
>>> read/writes.
>>> However, changing the partition count to 128 from default 1024 would
>>> improve
>>> your cluster restart.
>>>
>>> 2. I doubt if Ignite has any settings we have for this.
>>>
>>> 3. The only I can think of is to keep the data in on-heap if the data
>>> size
>>> is not so huge.
>>>
>>> 4. Read through, with native persistence enabled, doing a read to the
>>> disk
>>> will load the cache. But the read is much slower compared with read from
>>> RAM, by default it does not pre-load the data. If you want to avoid this
>>> you
>>> can pre-load the data programatically and load Memory, good for even SQL
>>> SELECT as well. But with the 3rd party persistence, we need to pre-load
>>> the
>>> data to make your read work for SQL SELECT.
>>>
>>> Thanks
>>> Naveen
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: ignite questions

2019-01-02 Thread Mikael


Hi!

By default you cannot assign a specific affinity key to a specific node 
but I think that could be done with a custom affinity function, you can 
do pretty much whatever you want with that, for example set an attribute 
in the XML file and use that to match with a specific affinity key 
value, so a node with attribute x will be assigned all affinity keys 
with value y.


I never tried it but I do not see any reason why it would not work.

Mikael


Den 2019-01-02 kl. 17:13, skrev Clay Teahouse:

Thanks Mikael.

I did come across that link before, but I am not sure it addresses my 
concern. I want to see how I need I size my physical VMs based on 
affinity keys. How would I say for India affinity key use this super 
size VM and for others use the other smaller ones, so the data doesn't 
get shuffled around? Maybe, there is no way, and I just have to wait 
for ignite to rebalance the partitions and fit things where they 
should be based on the affinity key.


On Wed, Jan 2, 2019 at 8:32 AM Mikael > wrote:


You can find some information about capacity planning here:

https://apacheignite.readme.io/docs/capacity-planning

About your India example you can use affinity keys to keep data
together in groups to avoid network traffic.

https://apacheignite.readme.io/docs/affinity-collocation

Mikael

Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:

Thanks Naveen.

-- Cache Groups: When would I start considering cache groups, if
my system is growing, and sooner or later I will have to add to
my caches and I need to know 1) should I starting grouping now
(I'd think yes), 2) if no, when, what number of caches?
-- Capacity Planning: So, there is no guidelines on how to size
the nodes and the physical storage nodes reside on? How do I make
sure all the related data fit the same VM? It can't be the case
that I have to come up with 100s of super size VMs just because I
have one instance with a huge set of entries. For example, if I
have millions of entries for India and only a few for other
countries, how do I make sure all the India related data fits the
same VM (to avoid the network) and have the data for all the
small countries fit on the same VM?
-- Pinning the data to cache: the data pinned to on-heap cache
does not get evicted from the memory? I want to see if there is
something similar to Oracle's memory pinning.
-- Read through: How do I know if something on cache or disk
(using native persistence)?
5) Service chaining: Is there an example of service chaining that
you can point me to?

6) How do I implement service pipelining in apache ignite? Would
continuous query be the mechanism? Any examples?

7) Streaming: Are there examples on how to define watermarks,
i.e., input completeness with regard to the event timestamp?

thank you
Clay

On Tue, Jan 1, 2019 at 11:29 PM Naveen mailto:naveen.band...@gmail.com>> wrote:

Hello
Couple of things I would like to with my experience

1. Cache Groups : Around 100 caches, I do not think we need
to go for Cache
groups, as you mentioned cache groups will have impact on you
read/writes.
However, changing the partition count to 128 from default
1024 would improve
your cluster restart.

2. I doubt if Ignite has any settings we have for this.

3. The only I can think of is to keep the data in on-heap if
the data size
is not so huge.

4. Read through, with native persistence enabled, doing a
read to the disk
will load the cache. But the read is much slower compared
with read from
RAM, by default it does not pre-load the data. If you want to
avoid this you
can pre-load the data programatically and load Memory, good
for even SQL
SELECT as well. But with the 3rd party persistence, we need
to pre-load the
data to make your read work for SQL SELECT.

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ignite questions

2019-01-02 Thread Clay Teahouse

Thanks Mikael.

I did come across that link before, but I am not sure it addresses my
concern. I want to see how I need I size my physical VMs based on affinity
keys. How would I say for India affinity key use this super size VM and for
others use the other smaller ones, so the data doesn't get shuffled around?
Maybe, there is no way, and I just have to wait for ignite to rebalance the
partitions and fit things where they should be based on the affinity key.

On Wed, Jan 2, 2019 at 8:32 AM Mikael  wrote:

> You can find some information about capacity planning here:
>
> https://apacheignite.readme.io/docs/capacity-planning
>
> About your India example you can use affinity keys to keep data together
> in groups to avoid network traffic.
>
> https://apacheignite.readme.io/docs/affinity-collocation
>
> Mikael
> Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:
>
> Thanks Naveen.
>
> -- Cache Groups: When would I start considering cache groups, if my system
> is growing, and sooner or later I will have to add to my caches and I need
> to know 1) should I starting grouping now (I'd think yes), 2) if no, when,
> what number of caches?
> -- Capacity Planning: So, there is no guidelines on how to size the nodes
> and the physical storage nodes reside on? How do I make sure all the
> related data fit the same VM? It can't be the case that I have to come up
> with 100s of super size VMs just because I have one instance with a huge
> set of entries. For example, if I have millions of entries for India and
> only a few for other countries, how do I make sure all the India related
> data fits the same VM (to avoid the network) and have the data for all the
> small countries fit on the same VM?
> -- Pinning the data to cache: the data pinned to on-heap cache does not
> get evicted from the memory? I want to see if there is something similar to
> Oracle's memory pinning.
> -- Read through: How do I know if something on cache or disk (using native
> persistence)?
> 5) Service chaining: Is there an example of service chaining that you can
> point me to?
>
> 6) How do I implement service pipelining in apache ignite? Would
> continuous query be the mechanism? Any examples?
>
> 7) Streaming: Are there examples on how to define watermarks, i.e., input
> completeness with regard to the event timestamp?
>
> thank you
> Clay
>
> On Tue, Jan 1, 2019 at 11:29 PM Naveen  wrote:
>
>> Hello
>> Couple of things I would like to with my experience
>>
>> 1. Cache Groups : Around 100 caches, I do not think we need to go for
>> Cache
>> groups, as you mentioned cache groups will have impact on you read/writes.
>> However, changing the partition count to 128 from default 1024 would
>> improve
>> your cluster restart.
>>
>> 2. I doubt if Ignite has any settings we have for this.
>>
>> 3. The only I can think of is to keep the data in on-heap if the data size
>> is not so huge.
>>
>> 4. Read through, with native persistence enabled, doing a read to the disk
>> will load the cache. But the read is much slower compared with read from
>> RAM, by default it does not pre-load the data. If you want to avoid this
>> you
>> can pre-load the data programatically and load Memory, good for even SQL
>> SELECT as well. But with the 3rd party persistence, we need to pre-load
>> the
>> data to make your read work for SQL SELECT.
>>
>> Thanks
>> Naveen
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: ignite questions

2019-01-02 Thread Mikael


You can find some information about capacity planning here:

https://apacheignite.readme.io/docs/capacity-planning

About your India example you can use affinity keys to keep data together 
in groups to avoid network traffic.


https://apacheignite.readme.io/docs/affinity-collocation

Mikael

Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:

Thanks Naveen.

-- Cache Groups: When would I start considering cache groups, if my 
system is growing, and sooner or later I will have to add to my caches 
and I need to know 1) should I starting grouping now (I'd think yes), 
2) if no, when, what number of caches?
-- Capacity Planning: So, there is no guidelines on how to size the 
nodes and the physical storage nodes reside on? How do I make sure all 
the related data fit the same VM? It can't be the case that I have to 
come up with 100s of super size VMs just because I have one instance 
with a huge set of entries. For example, if I have millions of entries 
for India and only a few for other countries, how do I make sure all 
the India related data fits the same VM (to avoid the network) and 
have the data for all the small countries fit on the same VM?
-- Pinning the data to cache: the data pinned to on-heap cache does 
not get evicted from the memory? I want to see if there is something 
similar to Oracle's memory pinning.
-- Read through: How do I know if something on cache or disk (using 
native persistence)?
5) Service chaining: Is there an example of service chaining that you 
can point me to?


6) How do I implement service pipelining in apache ignite? Would 
continuous query be the mechanism? Any examples?


7) Streaming: Are there examples on how to define watermarks, i.e., 
input completeness with regard to the event timestamp?


thank you
Clay

On Tue, Jan 1, 2019 at 11:29 PM Naveen > wrote:


Hello
Couple of things I would like to with my experience

1. Cache Groups : Around 100 caches, I do not think we need to go
for Cache
groups, as you mentioned cache groups will have impact on you
read/writes.
However, changing the partition count to 128 from default 1024
would improve
your cluster restart.

2. I doubt if Ignite has any settings we have for this.

3. The only I can think of is to keep the data in on-heap if the
data size
is not so huge.

4. Read through, with native persistence enabled, doing a read to
the disk
will load the cache. But the read is much slower compared with
read from
RAM, by default it does not pre-load the data. If you want to
avoid this you
can pre-load the data programatically and load Memory, good for
even SQL
SELECT as well. But with the 3rd party persistence, we need to
pre-load the
data to make your read work for SQL SELECT.

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

1 2 3 >

1 - 100 of 230 matches

Mail list logo