[jira] [Created] (IGNITE-13298) Found long running cache at client end

2020-07-24 Thread Vipul Thakur (Jira)
Vipul Thakur created IGNITE-13298:
-

 Summary: Found long running cache at client end 
 Key: IGNITE-13298
 URL: https://issues.apache.org/jira/browse/IGNITE-13298
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
 Environment: cluster memory config/persistence 

                             
                    
                                   
                                     
                                  
                          
         
                                       
                                  
                                                                                                                                   
                        
                                    

==Client thread dump ===

2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) 64-Bit 
Server VM (25.211-b12 mixed mode):
"Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x7fe7f4001000 nid=0x32d 
waiting on condition [0x]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 
tid=0x7fe6e805e000 nid=0x31a waiting on condition [0x7fe2e5bdd000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for  <0xcb87d9d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) 
at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110)
 at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 
tid=0x7fe460013800 nid=0x319 in Object.wait() [0x7fe2e5cde000]   
java.lang.Thread.State: TIMED_WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) at 
com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) 
at 
com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845)
 - locked <0xcb8cce50> (a 
com.solacesystems.jcsmp.impl.XMLMessageQueueList) at 
com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) 
at 
org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86)
 at 
org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 
tid=0x7fe6f8028000 nid=0x318 waiting on condition [0x7fe2e5ddf000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for  <0xcb8cf8d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) 
at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110)
 at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 
tid=0x7f

Re: Expiry Policies

2020-07-24 Thread Denis Magda
Hi Mark,

What name do you use to access this cache from code? The name is not set in
your config file.

You can refer to this configuration sample that works like a charm:
https://www.gridgain.com/docs/latest/developers-guide/configuring-caches/expiry-policies#configuration

-
Denis


On Wed, Jul 22, 2020 at 7:45 AM Davydov, Mark <
mark.davy...@scientificgames.com> wrote:

> Good morning!
>
> I am trying to evaluate Apache Ignite for our production system. So far we
> like the product, but we are having problem with configuring Expiry
> Policies. We need to expire data from cache after 5 seconds but can't
> figure out how to do it. The records are still there after hours. We also
> enabled eagerTtl to true with no luck.
>
> Please advice.
>
>
> We are running Apache Ignite 2.8.1 on Ubuntu 18.4 using Java 11, C++
> Apache Ignite client.
>
> Here is our configuration.
>
>
> http://www.springframework.org/schema/beans";
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>xsi:schemaLocation="
>http://www.springframework.org/schema/beans
>http://www.springframework.org/schema/beans/spring-beans.xsd";>
>
>  class="org.apache.ignite.configuration.IgniteConfiguration"/>
>
> 
> 
>  factory-method="factoryOf">
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> 
>
>


[jira] [Created] (IGNITE-13297) The ignite-zookeeper module does not work properly in Hadoop YARN environment.

2020-07-24 Thread YuJue Li (Jira)
YuJue Li created IGNITE-13297:
-

 Summary: The ignite-zookeeper module does not work properly in 
Hadoop YARN environment.
 Key: IGNITE-13297
 URL: https://issues.apache.org/jira/browse/IGNITE-13297
 Project: Ignite
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 2.8.1
Reporter: YuJue Li
 Fix For: 2.10
 Attachments: image-2020-07-25-10-03-54-464.png

The ignite-zookeeper module does not work properly in Hadoop YARN environment, 
the issue should be related to the version conflict of curator, the errors are 
as follows:

!image-2020-07-25-10-03-54-464.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-24 Thread Nikolay Izhikov
Hello, Maksim.

Thanks for the summary.
From my point of view, we should focus on Phase3 implementation and then do the 
refactoring for some specific SPI implementation.

> 8. Question - why we add a lot of system properties?

Can you, please, list system properties that should be moved to the 
configuration?

> 10. Question - CRC is read in two places encryptionFileIO and filePageStore - 
> what should we do with this?

filePageStore checks CRC of the encrypted page. This required to confirm the 
page not corrupted on the disk.
encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the 
encrypted data). 
This required to be sure the decrypted page contains correct data and not 
replaced with some malicious content.

Here is the list of items that are not related to Phase3 implementation.
Please, tell me what do you think:

>   2. We should try to run the existing test suites in encryption mode.

We did it during TDE.Phase1 testing.

>  3. SPI requires an additional method such as getKeyDigest
>  4. Recommendation - the encryption processor should be divided into external 
> subclasses
 
> 5. Recommendation - we should not use tuples and triples, because this is a 
> marker of a design problem.
> 6. Strict recommendation - please don't put context everywhere

Actually, this is a question of taste and obviously not related to the current 
discussion.

> 24 июля 2020 г., в 14:27, Maksim Stepachev  
> написал(а):
> 
> Hello everyone, yesterday we discussed the implementation of TDE over a
> conference call. I added a summary of this call here:
> 
>   1. The wiki documentation should be expanded. It should describe the
>   steps - how it works under the hood. What are the domain objects in the
>   implementation?
>   2. We should try to run the existing test suites in encryption mode.
>   Encryption should not affect any PDS or other tests.
>   3. SPI requires an additional method such as getKeyDigest, because the
>   current implementation of GridEncryptionManager#masterKeyDigest() looks
>   strange. We reset the master key to calculate the digest. This will not
>   work well if we want to use VOLT as a key provider implementation.
>   4. Recommendation - the encryption processor should be divided into
>   external subclasses, and we should use the OOP decomposition pattern for
>   it. Right now, this class has more than 2000 lines and does not support
>   SOLID. This is similar to inline unrelated logic with a single class.
>   5. Recommendation - we should not use tuples and triples, because this
>   is a marker of a design problem.
>   6. Strict recommendation - please don't put context everywhere. it
>   should only be used in the parent class. You can pass the necessary
>   dependencies through the constructor, as in the DI pattern.
>   7. Question -the current implementation does not use the throttling that
>   is implemented in PDS. Users should set the throughput such as 5 MB per
>   second, but not the timeout, packet size, or stream size.
>   8. Question - why we add a lot of system properties? Why we didn’t add a
>   configuration for it?
>   9. Question - How do we optimize when we can check that this page is
>   already encrypted by parallel loading? Maybe we should do this in Phase 4?
>   10. Question - CRC is read in two places encryptionFileIO and
>   filePageStore - what should we do with this?
>   11. We should remember about complicated test scenarios with failover
>   like node left when encryption started and joined after it finished. In the
>   process, the baseline changed node left before / after / in the middle of
>   this process. And etc.
>   12. How to use a sandbox to protect our cluster of master and user key
>   stealing via compute?
>   13. Will re-encryption continue after the cluster is completely stopped?
> 
> If I forgot some points, you can add them to the message.
> 
> 
> вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin :
> 
>> Hello, Maksim.
>> 
>> For implementation, I chose so-called "in place background
>> re-encryption" design.
>> 
>> The first step is to rotate the key for writing data, it only works on
>> the active cluster, at the moment..
>> The second step is re-encryption (to remove previous encryption key).
>> If node was restarted reencryption starts after metastorage becomes
>> ready for read/write. Each "re-encrypted" partition (including index)
>> has an attribute on the meta page that indicates whether background
>> re-encryption should be continued.
>> 
>> I updated the description in wiki [1].
>> Some more details in jira [2].
>> Draft PR [3].
>> 
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
>> [2] https://issues.apache.org/jira/browse/IGNITE-12843
>> [3] https://github.com/apache/ignite/pull/7941
>> 
>> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev :
>>> 
>>> Hi!
>>> 
>>> Do you have any updates about this issue? What types of implementations
>>> have you chosen (in-place, offline, or in the background

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

2020-07-24 Thread Maksim Stepachev
Hello everyone, yesterday we discussed the implementation of TDE over a
conference call. I added a summary of this call here:

   1. The wiki documentation should be expanded. It should describe the
   steps - how it works under the hood. What are the domain objects in the
   implementation?
   2. We should try to run the existing test suites in encryption mode.
   Encryption should not affect any PDS or other tests.
   3. SPI requires an additional method such as getKeyDigest, because the
   current implementation of GridEncryptionManager#masterKeyDigest() looks
   strange. We reset the master key to calculate the digest. This will not
   work well if we want to use VOLT as a key provider implementation.
   4. Recommendation - the encryption processor should be divided into
   external subclasses, and we should use the OOP decomposition pattern for
   it. Right now, this class has more than 2000 lines and does not support
   SOLID. This is similar to inline unrelated logic with a single class.
   5. Recommendation - we should not use tuples and triples, because this
   is a marker of a design problem.
   6. Strict recommendation - please don't put context everywhere. it
   should only be used in the parent class. You can pass the necessary
   dependencies through the constructor, as in the DI pattern.
   7. Question -the current implementation does not use the throttling that
   is implemented in PDS. Users should set the throughput such as 5 MB per
   second, but not the timeout, packet size, or stream size.
   8. Question - why we add a lot of system properties? Why we didn’t add a
   configuration for it?
   9. Question - How do we optimize when we can check that this page is
   already encrypted by parallel loading? Maybe we should do this in Phase 4?
   10. Question - CRC is read in two places encryptionFileIO and
   filePageStore - what should we do with this?
   11. We should remember about complicated test scenarios with failover
   like node left when encryption started and joined after it finished. In the
   process, the baseline changed node left before / after / in the middle of
   this process. And etc.
   12. How to use a sandbox to protect our cluster of master and user key
   stealing via compute?
   13. Will re-encryption continue after the cluster is completely stopped?

If I forgot some points, you can add them to the message.


вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin :

> Hello, Maksim.
>
> For implementation, I chose so-called "in place background
> re-encryption" design.
>
> The first step is to rotate the key for writing data, it only works on
> the active cluster, at the moment..
> The second step is re-encryption (to remove previous encryption key).
> If node was restarted reencryption starts after metastorage becomes
> ready for read/write. Each "re-encrypted" partition (including index)
> has an attribute on the meta page that indicates whether background
> re-encryption should be continued.
>
> I updated the description in wiki [1].
> Some more details in jira [2].
> Draft PR [3].
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
> [2] https://issues.apache.org/jira/browse/IGNITE-12843
> [3] https://github.com/apache/ignite/pull/7941
>
> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev :
> >
> > Hi!
> >
> > Do you have any updates about this issue? What types of implementations
> > have you chosen (in-place, offline, or in the background)? I know that we
> > want to add a partition defragmentation function, we can add a hole to
> > integrate the re-encryption scheme. Could you update your IEP with your
> > plans?
> >
> > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin :
> >
> > > Nikolay, Alexei,
> > >
> > > thanks for your suggestions.
> > >
> > > Offline re-encryption does not seem so simple, we need to read/replace
> > > the existing encryption keys on all nodes (therefore, we should be
> > > able to read/write metastore/WAL and exchange data between the
> > > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > > stable read-only cluster) will be simple, but it still looks very
> > > inconvenient, at least because users will need to interrupt all
> > > operations.
> > >
> > > The main advantage of online "in place" re-encryption is that we'll
> > > support multiple keys for reading, and this procedure does not
> > > directly depend on background re-encryption.
> > >
> > > So, the first step is similar to rotating the master key when the new
> > > key was set for writing on all nodes - that’s it, the cache group key
> > > rotation is complete (this is what PCI DSS requires - encrypt new
> > > updates with new keys).
> > > The second step is to re-encrypt the existing data, As I said
> > > previously I thought about scanning all partition pages in some
> > > background mode (store progress on the metapage to continue after
> > > restart), but rebalance approach should also work here if I figure out
> > > how to automate thi

Share my Apache Ignite Use Case

2020-07-24 Thread Sonali.Dive
Kindly share  Apache Ignite Use Case with respect to all acpect.






Best Regards,
Sonali  Dive
Strategy & Architecture - Centre of Excellence
Reliance Industries Limited |6D FF,WS-186 , RCP | Ghansoli, Thane Belapur Road 
| Navi Mumbai 400701. INDIA
Office:+91 22 4477-49050

"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."


Re: [DISCUSSION] Add index rebuild time metrics

2020-07-24 Thread Stanislav Lukyanov
Got it. I thought that index building and index rebuilding are essentially the 
same,
but now I see that they are different: index rebuilding cares about all indexes 
at once while index building cares about particular ones.

Kirill's approach sounds good.

Stan

> On 20 Jul 2020, at 14:54, Alexey Goncharuk  wrote:
> 
> Stan,
> 
> Currently we never build indexes one-by-one - we always use a cache data
> row visitor which either updates all indexes (see IndexRebuildFullClosure)
> or updates a set of all indexes that need to catch up (see
> IndexRebuildPartialClosure). GIven that, I do not see any need for
> per-index rebuild status as this status will be updated for all outdated
> indexes simultaneously.
> 
> Kirill's approach for the total number of processed keys per cache seems
> reasonable to me.
> 
> --AG
> 
> пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл :
> 
>> Hi, Stan!
>> 
>> Perhaps it is worth clarifying what exactly I wanted to say.
>> Now we have 2 processes: building and rebuilding indexes.
>> 
>> At moment, we have some metrics for rebuilding indexes:
>> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>> 
>> I suggest adding another metric "Indexrebuildkeyprocessed", which will
>> allow you to determine how many records are left to rebuild for cache.
>> 
>> I think your comments are more about building an index that may need more
>> metrics, but I think you should do it in a separate ticket.
>> 
>> 03.07.2020, 03:09, "Stanislav Lukyanov" :
>>> If multiple indexes are to be built "number of indexed keys" metric may
>> be misleading.
>>> 
>>> As a cluster admin, I'd like to know:
>>> - Are all indexes ready on a node?
>>> - How many indexes are to be built?
>>> - How much resources are used by the index building (how many threads
>> are used)?
>>> - Which index(es?) is being built right now?
>>> - How much time until the current (single) index building finishes? Here
>> "time" can be a lot of things: partitions, entries, percent of the cache,
>> minutes and hours
>>> - How much time until all indexes are built?
>>> - How much does it take to build each of my indexes / a single index of
>> my cache on average?
>>> 
>>> I think we need a set of metrics and/or log messages to solve all of
>> these questions.
>>> I imaging something like:
>>> - numberOfIndexesToBuild
>>> - a standard set of metrics on the index building thread pool (do we
>> already have it?)
>>> - currentlyBuiltIndexName (assuming we only build one at a time which is
>> probably not true)
>>> - for the "time" metrics I think percentage might be the best as it's
>> the easiest to understand; we may add multiple metrics though.
>>> - For "time per each index" I'd add detailed log messages stating how
>> long did it take to build a particular index
>>> 
>>> Thanks,
>>> Stan
>>> 
 On 26 Jun 2020, at 12:49, ткаленко кирилл 
>> wrote:
 
 Hi, Igniters.
 
 I would like to know if it is possible to estimate how much the index
>> rebuild will take?
 
 At the moment, I have found the following metrics [1] and [2] and
>> since the rebuild is based on caches, I think it would be useful to know
>> how many records are processed in indexing. This way we can estimate how
>> long we have to wait for the index to be rebuilt by subtracting [3] and how
>> many records are indexed.
 
 I think we should add this metric [4].
 
 Comments, suggestions?
 
 [1] - https://issues.apache.org/jira/browse/IGNITE-12184
 [2] -
>> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
 [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
 [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>> 



Re: Extended logging for rebalance performance analysis

2020-07-24 Thread Stanislav Lukyanov
Hi Kirill,

I think it would be helpful to add to the log line
Completed rebalancing [grp=grp0, 
supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, 
duration=8ms, bytesRcvd=5,9 KB, topVer=AffinityTopologyVersion [topVer=4, 
minorTopVer=0], progress=1/3, rebalanceId=1]
a few more metrics:
avgSpeed=, partitionsNum=N, histPartitionsNum=M, 
histBytesRcvd=X KB, histDuration=Yms, 
histAvgSpeed=, fullPartitionsNum=K, fullBytesRcvd=X 
KB, fullDuration=Yms, fullAvgSpeed=

I believe these additional metrics are good in that a user can easily setup 
monitoring based on them, and they give a lot of useful info about rebalance 
performance in general. No need to look for the "Starting rebalance" message, 
everything is in one place.
If any of these are hard to add then we can skip them (I suspect histDuration 
and fullDuration can't be measured in practice but don't know for sure). Let's 
add what we can.

Thanks,
Stan

> On 3 Jul 2020, at 17:21, ткаленко кирилл  wrote:
> 
> Sorry, forget.
> 
> [1] - 
> org.apache.ignite.internal.processors.cache.CacheGroupsMetricsRebalanceTest#testCacheGroupRebalance
> 
> 03.07.2020, 17:20, "ткаленко кирилл" :
>> Hi, Stan!
>> 
>> I don't understand you yet.
>> 
>> Now you can use metrics as it was done in the test [1]. Or can you tell me 
>> where to do this, for example when completing rebalancing for all groups?
>> 
>> See what is now available and added in the logs:
>> 1)Which group is rebalanced and which type of rebalance.
>> Starting rebalance routine [grp0, topVer=AffinityTopologyVersion [topVer=4, 
>> minorTopVer=0], supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, 
>> fullPartitions=[4, 7], histPartitions=[], rebalanceId=1]
>> 
>> 2) Completion of rebalancing from one of the suppliers.
>> Completed rebalancing [grp=grp0, 
>> supplier=3f2ae7cf-2bfe-455a-a76a-01fe27a1, partitions=2, entries=60, 
>> duration=8ms, bytesRcvd=5,9 KB, topVer=AffinityTopologyVersion [topVer=4, 
>> minorTopVer=0], progress=1/3, rebalanceId=1]
>> 
>> 3) Completion of the entire rebalance.
>> Completed rebalance chain: [rebalanceId=1, partitions=116, entries=400, 
>> duration=41ms, bytesRcvd=40,4 KB]
>> 
>> These messages have a common parameter rebalanceId=1.
>> 
>> 03.07.2020, 16:48, "Stanislav Lukyanov" :
   On 3 Jul 2020, at 09:51, ткаленко кирилл  wrote:
 
   To calculate the average value, you can use the existing metrics 
 "RebalancingStartTime", "RebalancingLastCancelledTime", 
 "RebalancingEndTime", "RebalancingPartitionsLeft", 
 "RebalancingReceivedKeys" and "RebalancingReceivedBytes".
>>> 
>>>  You can calculate it, and I believe that this is the first thing anyone 
>>> would do when reading these logs and metrics.
>>>  If that's an essential thing then maybe it should be available out of the 
>>> box?
>>> 
   This also works correctly with the historical rebalance.
   Now we can see rebalance type for each group and for each supplier in 
 logs. I don't think we should duplicate this information.
 
   [2020-07-03 09:49:31,481][INFO 
 ][sys-#160%rebalancing.RebalanceStatisticsTest2%][root] Starting rebalance 
 routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=3, 
 minorTopVer=0], supplier=a8be67b8-8ec7-4175-aa04-a5957710, 
 fullPartitions=[0, 2, 4, 6, 8], histPartitions=[], rebalanceId=1]
>>> 
>>>  I'm talking about adding info on how much data has been transferred during 
>>> rebalance.
>>>  When rebalance completes I'd like to know how much data has been 
>>> transferred, was it historical or full, what was the average rebalance 
>>> speed.
>>> 
>>>  There are two reasons for having all that.
>>> 
>>>  First, it helps to analyze the issues by searching the logs and looking 
>>> for anomalies.
>>> 
>>>  Second, this allows to automate alerts: e.g. if you know your typical 
>>> historical rebalance speed, you can trigger an alert if it drops below that.
>>> 
   03.07.2020, 02:49, "Stanislav Lukyanov" :
>   Kirill,
> 
>   I've looked through the patch.
>   Looks good, but it feels like the first thing someone will try to do 
> given bytesRcvd and duration is to divide one by another to get an 
> average speed.
>   Do you think it's reasonable to also add it to the logs? Maybe even to 
> the metrics?
> 
>   Also, this works with historical rebalance, right? Can we specify how 
> much data was transferred via historical or full rebalance from each 
> supplier?
>   Maybe even estimate transfer speed in entries and bytes for each 
> rebalance type?
> 
>   Thanks,
>   Stan
> 
>>On 29 Jun 2020, at 11:50, Ivan Rakov  wrote:
>> 
>>+1 to Alex G.
>> 
>>From my experience, the most interesting cases with Ignite rebalancing
>>happen exactly in production. According to the fact that we already 
>> have
>>detailed rebalancing logging, adding info about rebalance performanc

Re: Move TcpDiscoveryZookeeperIpFinder to ignite-extensions as part of IEP-36

2020-07-24 Thread Anton Vinogradov
Ivan,

Sound like an obsolete compromise partial solution kept just because of the
"Zookeeper" word at its name, not because of some profit.

>> So I suggest to move this single class with tests to separate module in
>> ignite-extensions.
+1

On Fri, Jul 24, 2020 at 11:27 AM Ivan Daschinsky 
wrote:

> Hello, Igniters.
>
> Currently, in module ignite-zookeeper, that contains full implementation of
> ZookeeperDiscoverySpi, also presents one class that looks like a little
> snippet and I have some concerns about presence of this class in the
> module.
>
> 1) This class is a simple snippet-like implementation of
> TcpDiscoverIpFinder
> 2) It creates EPHEMERAL_SEQUENTIAL ZNode in root directory with json,
> containing IP addresses of joining node.
> 3) It reads registered children znodes from root directory  and use these
> addresses to participate in common TcpDiscovery process.
>
> 1) This class has nothing in common with ZookeeperDiscovery, but
> 2) It brings to module additional dependencies (apache-curator framework,
> jackson and so on)
> 3) All of these dependencies are not required to ZookeeperDiscoverySpi.
> 4) The usefulness of it is doubtful, because we have already fully
> functional ZookeeperDiscovery and use Zookeeper quorum as just simple store
> for IP addresses without utilizing full zookeeper power is questionable
> decision.
>
> So I suggest to move this single class with tests to separate module in
> ignite-extensions.
>
> What do you think?
>
> --
> Sincerely yours, Ivan Daschinskiy
>


Move TcpDiscoveryZookeeperIpFinder to ignite-extensions as part of IEP-36

2020-07-24 Thread Ivan Daschinsky
Hello, Igniters.

Currently, in module ignite-zookeeper, that contains full implementation of
ZookeeperDiscoverySpi, also presents one class that looks like a little
snippet and I have some concerns about presence of this class in the module.

1) This class is a simple snippet-like implementation of TcpDiscoverIpFinder
2) It creates EPHEMERAL_SEQUENTIAL ZNode in root directory with json,
containing IP addresses of joining node.
3) It reads registered children znodes from root directory  and use these
addresses to participate in common TcpDiscovery process.

1) This class has nothing in common with ZookeeperDiscovery, but
2) It brings to module additional dependencies (apache-curator framework,
jackson and so on)
3) All of these dependencies are not required to ZookeeperDiscoverySpi.
4) The usefulness of it is doubtful, because we have already fully
functional ZookeeperDiscovery and use Zookeeper quorum as just simple store
for IP addresses without utilizing full zookeeper power is questionable
decision.

So I suggest to move this single class with tests to separate module in
ignite-extensions.

What do you think?

-- 
Sincerely yours, Ivan Daschinskiy


Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-07-24 Thread Alex Plehanov
Guys,

I've cherry-picked IGNITE-12438 (One-way client-server connections) and
IGNITE-13038 (Web console removing) to 2.9.

Since there are no objections I will move IGNITE-13006 (Spring libs upgrade
to 5.2 version), IGNITE-12489 (Error during purges by expiration) and
IGNITE-12553 (public Java metrics API) to the next release.

I will cherry-pick ticket IGNITE-11942 (IGFS and Hadoop removing) after it
will be reviewed and merged to master.

What about IGNITE-12911 [1] (B+Tree Corrupted exception when using a key
extracted from a BinaryObject value object --- and SQL enabled)? Anton
Kalashnikov, Ilya Kasnacheev, can you please clarify the ticket status?

[1]: https://issues.apache.org/jira/browse/IGNITE-12911

чт, 23 июл. 2020 г. в 12:08, Alexey Kuznetsov :

> Alex, Denis
>
> Issue with moving of Web Console to separate repository is merged to
> master.
> See:  https://issues.apache.org/jira/browse/IGNITE-13038
>
> Please, consider to cherry-pick to ignite-2.9
>
> ---
> Alexey Kuznetsov
>
>


[jira] [Created] (IGNITE-13296) .NET: TransactionImpl finalizer can crash the process

2020-07-24 Thread Pavel Tupitsyn (Jira)
Pavel Tupitsyn created IGNITE-13296:
---

 Summary: .NET: TransactionImpl finalizer can crash the process
 Key: IGNITE-13296
 URL: https://issues.apache.org/jira/browse/IGNITE-13296
 Project: Ignite
  Issue Type: Bug
  Components: platforms
Reporter: Pavel Tupitsyn
Assignee: Pavel Tupitsyn
 Fix For: 2.10


~TransactionImpl potentially throws an exception (e.g. when grid is stopped), 
causing the entire process to abort.
* Finalizers should not throw exceptions
* Stopped grid is a valid use case

Review all filalizers and make sure they are safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)