[GitHub] ignite pull request #3036: IGNITE-6916: node joining with enabled pds and em...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3036


---


[GitHub] ignite pull request #3045: IGN-8915

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3045


---


Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Vladimir Ozerov
In the first iteration I would focus only on reporting facilities, to let
administrator spot dangerous situation. And in the second phase, when all
reporting and metrics are ready, we can think on some automatic actions.

On Wed, Nov 22, 2017 at 10:39 AM, Mikhail Cherkasov  wrote:

> Hi Anton,
>
> I don't think that we should shutdown node in case of IgniteOOMException,
> if one node has no space, then other probably  don't have it too, so re
> -balancing will cause IgniteOOM on all other nodes and will kill the whole
> cluster. I think for some configurations cluster should survive and allow
> to user clean cache or/and add more nodes.
>
> Thanks,
> Mikhail.
>
> 20 нояб. 2017 г. 6:53 ПП пользователь "Anton Vinogradov" <
> avinogra...@gridgain.com> написал:
>
> > Igniters,
> >
> > Internal problems may and, unfortunately, cause unexpected cluster
> > behavior.
> > We should determine behavior in case any of internal problem happened.
> >
> > Well known internal problems can be split to:
> > 1) OOM or any other reason cause node crash
> >
> > 2) Situations required graceful node shutdown with custom notification
> > - IgniteOutOfMemoryException
> > - Persistence errors
> > - ExchangeWorker exits with error
> >
> > 3) Prefomance issues should be covered by metrics
> > - GC STW duration
> > - Timed out tasks and jobs
> > - TX deadlock
> > - Hanged Tx (waits for some service)
> > - Java Deadlocks
> >
> > I created special issue [1] to make sure all these metrics will be
> > presented at WebConsole or VisorConsole (what's preferred?)
> >
> > 4) Situations required external monitoring implementation
> > - GC STW duration exceed maximum possible length (node should be stopped
> > before STW finished)
> >
> > All this problems were reported by different persons different time ago,
> > So, we should reanalyze each of them and, possible, find better ways to
> > solve them than it described at issues.
> >
> > P.s. IEP-7 [2] already contains 9 issues, feel free to mention something
> > else :)
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-6961
> > [2]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 7%3A+Ignite+internal+problems+detection
> >
>


Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Mikhail Cherkasov
Hi Anton,

I don't think that we should shutdown node in case of IgniteOOMException,
if one node has no space, then other probably  don't have it too, so re
-balancing will cause IgniteOOM on all other nodes and will kill the whole
cluster. I think for some configurations cluster should survive and allow
to user clean cache or/and add more nodes.

Thanks,
Mikhail.

20 нояб. 2017 г. 6:53 ПП пользователь "Anton Vinogradov" <
avinogra...@gridgain.com> написал:

> Igniters,
>
> Internal problems may and, unfortunately, cause unexpected cluster
> behavior.
> We should determine behavior in case any of internal problem happened.
>
> Well known internal problems can be split to:
> 1) OOM or any other reason cause node crash
>
> 2) Situations required graceful node shutdown with custom notification
> - IgniteOutOfMemoryException
> - Persistence errors
> - ExchangeWorker exits with error
>
> 3) Prefomance issues should be covered by metrics
> - GC STW duration
> - Timed out tasks and jobs
> - TX deadlock
> - Hanged Tx (waits for some service)
> - Java Deadlocks
>
> I created special issue [1] to make sure all these metrics will be
> presented at WebConsole or VisorConsole (what's preferred?)
>
> 4) Situations required external monitoring implementation
> - GC STW duration exceed maximum possible length (node should be stopped
> before STW finished)
>
> All this problems were reported by different persons different time ago,
> So, we should reanalyze each of them and, possible, find better ways to
> solve them than it described at issues.
>
> P.s. IEP-7 [2] already contains 9 issues, feel free to mention something
> else :)
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6961
> [2]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 7%3A+Ignite+internal+problems+detection
>


[jira] [Created] (IGNITE-6983) SQL: optimize CREATE INDEX and BPlusTree interaction

2017-11-21 Thread Vladimir Ozerov (JIRA)
Vladimir Ozerov created IGNITE-6983:
---

 Summary: SQL: optimize CREATE INDEX and BPlusTree interaction
 Key: IGNITE-6983
 URL: https://issues.apache.org/jira/browse/IGNITE-6983
 Project: Ignite
  Issue Type: Task
  Components: cache, sql
Reporter: Vladimir Ozerov
 Fix For: 2.4


Currently index is built as follows:
1) Get next entry from partition's tree
2) Read it's key (copy to heap)
3) Acquire lock on {{GridCacheMapEntry}}
4) Lookup the same key in the tree from the top
5) Read it's value (copy to heap)
6) Add to index.

This is very complex flow. We can optimize two things - tree lookup and value 
deserialization as follows:
1) Every data page will have update counter, which is incremented every time 
anything is changed.
2) When lock on {{GridCacheMapEntry}} is acquired, we will acquire lock on the 
data page and re-check update counter. 
3) If page was changed between iterator read and lock acquisition then use old 
flow. 
4) Otherwise - set read lock on the page, read value as *offheap* object, apply 
it to index.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Right MXBean for new metrics

2017-11-21 Thread Dmitriy Setrakyan
Hi Alex,

I think the proper approach would be to have a separate MBean for cache
groups. It should show average metrics across all the caches in the group
and some additional metrics as well. Agree?

Also, I am not sure I understand what is "partition redundancy level" and
what that metric would show. Can you explain.

D.

On Tue, Nov 21, 2017 at 2:28 AM, Alex Plehanov 
wrote:

> Hello, Igniters!
>
>
>
> I would like to discuss the implementation of ticket IGNITE-6871.
>
>
>
> In our Ignite instance there are more than 1000 caches and about 10 cache
> groups. To minimize the probability of data loss we need to alert when a
> critical level of redundancy in cluster is reached. So, we need some
> metric, which will count a minimal partition redundancy level for a cache
> group.
>
>
>
> Now there are no MXBeans for cache groups. And since cache groups were
> introduced, some metrics from CacheMetricsMXBean actually show information
> about the cache group, but not about the cache.
>
>
>
> I can implement the new metric (minimal partition redundancy level for
> cache group) in CacheMetricsMXBean, the same way it was before. In such
> case we’ll whether need to monitor this metric for all caches or to get
> somehow information about cache to cache group relation and to monitor this
> metric for only one cache per cache group. But it’s not transparent to an
> administrator which cache groups are existing and which caches belong to
> which cache group.
>
>
>
> Alternatively, I can implement a new type of MXBean for cache groups and
> add a new metric to this MXBean. Maybe it will be useful later to add to
> this MXBean some other cache group related metrics, which now are
> implemented in CacheMetricsMXBean.
>
>
>
> So, should I extend existing CacheMetricsMXBean or create a new type of
> MXBeans for cache groups?
>


Re: Losing data during restarting cluster with persistence enabled

2017-11-21 Thread Valentin Kulichenko
Vyacheslav,

If you want the persistence storage to be *distributed*, then using Ignite
persistence would be the easiest thing to do anyway, even if you don't need
all its features.

CacheStore indeed can be updated from different nodes with different nodes,
but the problem is in coordination. If instances of the store are not aware
of each other, it's really hard to handle all rebalancing cases. Such
solution will work only on stable topology.

Having said that, if you can have one instance of RocksDB (or any other DB
for that matter) that is accessed via network by all nodes, then it's also
an option. But in this case storage is not distributed.

-Val

On Tue, Nov 21, 2017 at 4:37 AM, Vyacheslav Daradur 
wrote:

> Valentin,
>
> >> Why don't you use Ignite persistence [1]?
> I have a use case for one of the projects that need the RAM on disk
> replication only. All PDS features aren't needed.
> During the first assessment, persist to RocksDB works faster.
>
> >> CacheStore design assumes that the underlying storage is shared by all
> the nodes in topology.
> This is the very important note.
> I'm a bit confused because I've thought that each node in cluster
> persists partitions for which the node is either primary or backup
> like in PDS.
>
> My RocksDB implementation supports working with one DB instance which
> shared by all the nodes in the topology, but it would make no sense of
> using embedded fast storage.
>
> Is there any link to a detailed description of CacheStorage design or
> any other advice?
> Thanks in advance.
>
>
>
> On Fri, Nov 17, 2017 at 9:07 PM, Valentin Kulichenko
>  wrote:
> > Vyacheslav,
> >
> > CacheStore design assumes that the underlying storage is shared by all
> the
> > nodes in topology. Even if you delay rebalancing on node stop (which is
> > possible via CacheConfiguration#rebalanceDelay), I doubt it will solve
> all
> > your consistency issues.
> >
> > Why don't you use Ignite persistence [1]?
> >
> > [1] https://apacheignite.readme.io/docs/distributed-persistent-store
> >
> > -Val
> >
> > On Fri, Nov 17, 2017 at 4:24 AM, Vyacheslav Daradur  >
> > wrote:
> >
> >> Hi Andrey! Thank you for answering.
> >>
> >> >> Key to partition mapping shouldn't depends on topology, and shouldn't
> >> changed unstable topology.
> >> Key to partition mapping doesn't depend on topology in my test
> >> affinity function. It only depends on partitions number.
> >> But partition to node mapping depends on topology and at cluster stop,
> >> when one node left topology, some partitions may be moved to other
> >> nodes.
> >>
> >> >> Does all nodes share same RockDB database or each node has its own
> copy?
> >> Each Ignite node has own RocksDB instance.
> >>
> >> >> Would you please share configuration?
> >> It's pretty simple:
> >> IgniteConfiguration cfg = new IgniteConfiguration();
> >> cfg.setIgniteInstanceName(instanceName);
> >>
> >> CacheConfiguration cacheCfg = new
> >> CacheConfiguration<>();
> >> cacheCfg.setName(TEST_CACHE_NAME);
> >> cacheCfg.setCacheMode(CacheMode.PARTITIONED);
> >> cacheCfg.setWriteSynchronizationMode(
> >> CacheWriteSynchronizationMode.PRIMARY_SYNC);
> >> cacheCfg.setBackups(1);
> >> cacheCfg.setAffinity(new
> >> TestAffinityFunction(partitionsNumber, backupsNumber));
> >> cacheCfg.setWriteThrough(true);
> >> cacheCfg.setReadThrough(true);
> >> cacheCfg.setRebalanceMode(CacheRebalanceMode.SYNC);
> >> cacheCfg.setCacheStoreFactory(new
> >> RocksDBCacheStoreFactory<>("/test/path/to/persistence",
> >> TEST_CACHE_NAME, cfg));
> >>
> >> cfg.setCacheConfiguration(cacheCfg);
> >>
> >> Could you give me advice on places which I need to pay attention?
> >>
> >>
> >> On Wed, Nov 15, 2017 at 3:02 PM, Andrey Mashenkov
> >>  wrote:
> >> > Hi Vyacheslav,
> >> >
> >> > Key to partition mapping shouldn't depends on topology, and shouldn't
> >> > changed unstable topology.
> >> > Looks like you've missed smth.
> >> >
> >> > Would you please share configuration?
> >> > Does all nodes share same RockDB database or each node has its own
> copy?
> >> >
> >> >
> >> >
> >> > On Wed, Nov 15, 2017 at 12:22 AM, Vyacheslav Daradur <
> >> daradu...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi, Igniters!
> >> >>
> >> >> I’m using partitioned Ignite cache with RocksDB as 3rd party
> persistence
> >> >> store.
> >> >> I've got an issue: if cache rebalancing is switched on, then it’s
> >> >> possible to lose some data.
> >> >>
> >> >> Basic scenario:
> >> >> 1) Start Ignite cluster and fill a cache with RocksDB persistence;
> >> >> 2) Stop all nodes
> >> >> 3) Start Ignite cluster and validate data
> >> >>
> >> >> This works fine while rebalancing is switched off.
> >> >>
> >> >> If rebalancing switched on: when I call Ignition#stopAll, some nodes
> >> >> go down sequentially and while one node having gone down another
> start
> >> >> rebalancing. When nodes started affinity function works with a

Re: SQL warning for partitioned caches with setLocal

2017-11-21 Thread Luqman Ahmad
Hi Vladmir,

Agree - they shouldnt be coupled togethor but what if we can set something
in affinity api which can be read in sql api.

Please correct me if I am wrong but in the affinityCall/Run we have to
provide all the cache names and rebalancing will skip if there is already
an operation in process. If we go with your approach not sure whether we
can calculate all the related partitioned caches to be locked dynamically.

Ofcourse you would be in a better position to comment on it but cant we
introduce something in affinity api which can be set/read through each
affinityCall/Run,  and the affinity api can be used inside SQL api - just
like the same way calculating partition id for a specific key or an finding
an atomic reference.

Thanks,
Luqman



On 21 Nov 2017 20:17, "Vladimir Ozerov"  wrote:

Hi Luqman,

I do not think SQL and compute should be coupled in the product. Instead,
we should fix local query execution and pin partitions in the same way it
is done for affinityCall/Run and distributed SQL.

On Tue, Nov 21, 2017 at 6:25 PM, luqmanahmad  wrote:

> Thanks dsetrakyan,
>
> I would like to add a few more things over here which should be applicable
> to partitioned caches.
>
> This context variable which is set through affinityCall or affinityRun
> should be available through either a helper class or cache configuration.
> There could be other advantages as well for example:
>
> 1. We can check the context variable in all the partitioned cache
> operations. In department and employee example if an employee is accessed
> without an affinityRun or affinityCall computation it should also log a
> WARNING message or through an exception based on the cache configuration.
>
> 2. The user would be able to implement their own custom checks using it.
> For
> example, if we want to have some abstract level checks to restrict
> developers to use specific functionality related to partitioned caches.
>
> Luqman
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: SQL warning for partitioned caches with setLocal

2017-11-21 Thread Vladimir Ozerov
Hi Luqman,

I do not think SQL and compute should be coupled in the product. Instead,
we should fix local query execution and pin partitions in the same way it
is done for affinityCall/Run and distributed SQL.

On Tue, Nov 21, 2017 at 6:25 PM, luqmanahmad  wrote:

> Thanks dsetrakyan,
>
> I would like to add a few more things over here which should be applicable
> to partitioned caches.
>
> This context variable which is set through affinityCall or affinityRun
> should be available through either a helper class or cache configuration.
> There could be other advantages as well for example:
>
> 1. We can check the context variable in all the partitioned cache
> operations. In department and employee example if an employee is accessed
> without an affinityRun or affinityCall computation it should also log a
> WARNING message or through an exception based on the cache configuration.
>
> 2. The user would be able to implement their own custom checks using it.
> For
> example, if we want to have some abstract level checks to restrict
> developers to use specific functionality related to partitioned caches.
>
> Luqman
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Vladimir Ozerov
Anton,

The question is why user may need so precise measurement? I share Andrey’s
opinion - cannot understand the value.

вт, 21 нояб. 2017 г. в 19:33, Anton Vinogradov :

> Andrey,
>
> >  JVM provides sufficient means of detecting a struggling process out of
> the box.
>
> Could you point to some articles describing how to detect STW exceeding
> some duration using only JVM API?
>
> On Tue, Nov 21, 2017 at 7:17 PM, Andrey Kornev 
> wrote:
>
> > My 2 cents. Don’t do it. JVM provides sufficient means of detecting a
> > struggling process out of the box. SRE/Operations teams usually know how
> to
> > monitor JVMs and can handle killing of such processes themselves.
> >
> > The feature adds no value, just complexity (and more configuration
> > parameters (!) — as if Ignite didn’t have enough of them already).
> >
> > Regards,
> > Andrey
> > _
> > From: Denis Magda 
> > Sent: Monday, November 20, 2017 3:10 PM
> > Subject: Re: Facility to detect long STW pauses and other system response
> > degradations
> > To: 
> >
> >
> > My 2 cents.
> >
> > 1. Totally for a separate native process that will handle the monitoring
> > of an Ignite process. The watchdog process can simply start a JVM tool
> like
> > jstat and parse its GC logs: https://dzone.com/articles/
> > how-monitor-java-garbage  > how-monitor-java-garbage>
> >
> > 2. As for the STW handling, I would make a possible reaction more
> generic.
> > Let’s define a policy (enumeration) that will define how to deal with an
> > unstable node. The events might be as follows - kill a node, restart a
> > node, trigger a custom script using Runtime.exec or other methods.
> >
> > What’d you think? Specifically on point 2.
> >
> > —
> > Denis
> >
> > > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov <
> avinogra...@gridgain.com>
> > wrote:
> > >
> > > Yakov,
> > >
> > > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> > >
> > > We split issue to
> > > #1 STW duration metrics
> > > #2 External monitoring allows to stop node during STW
> > >
> > >> Testing GC pause with java thread is
> > >> a bit strange and can give info only after GC pause finishes.
> > >
> > > That's ok since it's #1
> > >
> > > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> > sbt.sorokin@gmail.com>
> > > wrote:
> > >
> > >> I have tested solution with java-thread and GC logs had contain same
> > pause
> > >> values of thread stopping which was detected by java-thread.
> > >>
> > >>
> > >> My log (contains pauses > 100ms):
> > >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long STW
> > >> pause: 507 milliseconds.
> > >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long STW
> > >> pause: 5595 milliseconds.
> > >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long STW
> > >> pause: 3262 milliseconds.
> > >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long STW
> > >> pause: 1737 milliseconds.
> > >>
> > >> GC log:
> > >> gridgain@dell-5580-92zc8h2:~$ cat
> > >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> > >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
> > >> threads were stopped: 0,845 seconds, Stopping threads took:
> > 0,246
> > >> seconds
> > >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
> > >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> > 0,252
> > >> seconds
> > >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
> > >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> > 0,178
> > >> seconds // GOT!
> > >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
> > >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> > 0,229
> > >> seconds // GOT!
> > >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which application
> > >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> > 0,223
> > >> seconds // GOT!
> > >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which application
> > >> threads were stopped: 1,7337123 seconds, Stopping threads took:
> > 0,121
> > >> seconds // GOT!
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >>
> >
> >
> >
> >
>


[jira] [Created] (IGNITE-6982) .NET: Migrate to latest NUnit

2017-11-21 Thread Pavel Tupitsyn (JIRA)
Pavel Tupitsyn created IGNITE-6982:
--

 Summary: .NET: Migrate to latest NUnit
 Key: IGNITE-6982
 URL: https://issues.apache.org/jira/browse/IGNITE-6982
 Project: Ignite
  Issue Type: Improvement
  Components: platforms
Reporter: Pavel Tupitsyn
Assignee: Pavel Tupitsyn
 Fix For: 2.4


We use very old NUnit 2.6. In order to reuse tests under .NET Core 
(IGNITE-2662) we need latest NUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Data eviction/expiration from Ignite persistence

2017-11-21 Thread Denis Magda
We might break the compatibility for the next major release or even create a 
tool that will migrate persistence files from an old to new formats.

—
Denis

> On Nov 21, 2017, at 8:34 AM, Dmitry Pavlov  wrote:
> 
> Hi Denis,
> 
> Second fix we need to do is B+ tree separation in per-partition basis:
> https://issues.apache.org/jira/browse/IGNITE-5874
> 
> Should we take into account compatibilty issues with previous Ignite
> persistent store versions, because current TTL tree is persisted, and will
> change its format?
> 
> Sincerely,
> Dmitriy Pavlov
> 
> 
> вт, 21 нояб. 2017 г. в 2:13, Denis Magda :
> 
>> Dmitriy,
>> 
>> That’s about TTL and eviction support for Ignite persistence. Presently if
>> you set an expiration or eviction policy for a cache it will be applied for
>> data stored in memory. The policy never affects the persistence layer.
>> 
>> —
>> Denis
>> 
>>> On Nov 20, 2017, at 9:29 AM, Dmitry Pavlov 
>> wrote:
>>> 
>>> Hi Denis,
>>> 
>>> Is this need covered by PDS + TTL?
>>> 
>>> For the very first TTL test, I found some delay after applying TTL with
>> the
>>> repository enabled: https://issues.apache.org/jira/browse/IGNITE-6964
>>> 
>>> And I'm wondering if the user's needs are covered by
>>> https://apacheignite.readme.io/docs/expiry-policies plus
>>> https://apacheignite.readme.io/docs/distributed-persistent-store
>>> 
>>> Sincerely,
>>> Dmitriy Pavlov
>>> 
>>> сб, 18 нояб. 2017 г. в 12:12, Dmitry Pavlov :
>>> 
 Hi Denis,
 
 What is the difference of required by users functionality with TTL cache
 expiration?
 
 By some posts I can suppose TTL cache is compatible with native
 persistence.
 
 Sincerely,
 Dmitriy Pavlov
 
 сб, 18 нояб. 2017 г. в 0:41, Denis Magda :
 
> Igniters,
> 
> I’ve been talking to many Ignite users here and there who are already
>> on
> Ignite persistence or consider to turn it on. The majority of them are
>> more
> than satisfied with its current state and provided capabilities.
>> That’s is
> really good news for us.
> 
> However, I tend to come across the people who ask about
> eviction/expiration policies for the persistence itself. Had around 6
> conversation about the topic this month only.
> 
> Usually the requirement is connected with a streaming use case. When an
> application streams a lot of data (IoT, metrics, etc.) to the cluster
>> but
> the data becomes stale in some period of time (day, couple of days,
>> etc.).
> The user doesn’t want to waste the disk space and needs to simple
>> purge the
> data from there.
> 
> My suggestion here is to create a timer task that will remove the stale
> data from the cluster. However, since the demand is growing probably
>> it’s a
> good time to discuss a feasibility of this feature.
> 
> Alex G, as the main architect of the persistence, could you share your
> thoughts on this? What will it cost to us to support
>> eviction/expiration
> for the persistence?
> 
> —
> Denis
 
 
>> 
>> 



[jira] [Created] (IGNITE-6981) System thread pool continuously creates and destroys threads while idle

2017-11-21 Thread Joel Lang (JIRA)
Joel Lang created IGNITE-6981:
-

 Summary: System thread pool continuously creates and destroys 
threads while idle
 Key: IGNITE-6981
 URL: https://issues.apache.org/jira/browse/IGNITE-6981
 Project: Ignite
  Issue Type: Bug
  Components: 2.3, general
Affects Versions: 2.3
Reporter: Joel Lang
Priority: Minor


I've observed using VisualVM that Ignite is continuously starting and stopping 
system pool threads even while the pool is idle.

I've attached a screenshot. Notice the high thread number on the left.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Denis Magda
A lack of suggestions and thoughts encourages me to create a ticket:
https://issues.apache.org/jira/browse/IGNITE-6980 


—
Denis

> On Nov 20, 2017, at 2:53 PM, Denis Magda  wrote:
> 
> If an Ignite operation hangs by some reason due to an internal problem or 
> buggy application code it needs to eventual *time out*.
> 
> Take atomic operations case brought by Val to our attention recently:
> http://apache-ignite-developers.2346864.n4.nabble.com/Timeouts-in-atomic-cache-td19839.html
> 
> An application must not freeze waiting for a human being intervention if an 
> atomic update fails internally.
> 
> Even more I would let all possible operation to time out:
> - Ignite compute computations.
> - Ignite services calls.
> - Atomic/transactional cache updates.
> - SQL queries.
> 
> I’m not sure this is covered by any of the tickets from the IEP-7. Any 
> thoughts/suggestion before the one is created?
> 
> —
> Denis
> 
>> On Nov 20, 2017, at 8:56 AM, Anton Vinogradov  
>> wrote:
>> 
>> Dmitry,
>> 
>> There's two cases
>> 1) STW duration is long -> notifying monitoring via JMX metric
>> 
>> 2) STW duration exceed N seconds -> no need to wait for something.
>> We already know that node will be segmented or that pause bigger that N
>> seconds will affect cluster performance.
>> Better case is to kill node ASAP to protect the cluster. Some customers
>> have huge timeouts and such node can kill whole cluster in case it will not
>> be killed by watchdog.
>> 
>> On Mon, Nov 20, 2017 at 7:23 PM, Dmitry Pavlov 
>> wrote:
>> 
>>> Hi Anton,
>>> 
 - GC STW duration exceed maximum possible length (node should be stopped
>>> before
>>> STW finished)
>>> 
>>> Are you sure we should kill node in case long STW? Can we produce warnings
>>> into logs and monitoring tools an wait node to become alive a little bit
>>> longer if we detect STW. In this case we can notify coordinator or other
>>> node, that 'current node is in STW, please wait longer than 3 heartbeat
>>> timeout'.
>>> 
>>> It is probable such pauses will occur not often?
>>> 
>>> Sincerely,
>>> Dmitriy Pavlov
>>> 
>>> пн, 20 нояб. 2017 г. в 18:53, Anton Vinogradov :
>>> 
 Igniters,
 
 Internal problems may and, unfortunately, cause unexpected cluster
 behavior.
 We should determine behavior in case any of internal problem happened.
 
 Well known internal problems can be split to:
 1) OOM or any other reason cause node crash
 
 2) Situations required graceful node shutdown with custom notification
 - IgniteOutOfMemoryException
 - Persistence errors
 - ExchangeWorker exits with error
 
 3) Prefomance issues should be covered by metrics
 - GC STW duration
 - Timed out tasks and jobs
 - TX deadlock
 - Hanged Tx (waits for some service)
 - Java Deadlocks
 
 I created special issue [1] to make sure all these metrics will be
 presented at WebConsole or VisorConsole (what's preferred?)
 
 4) Situations required external monitoring implementation
 - GC STW duration exceed maximum possible length (node should be stopped
 before STW finished)
 
 All this problems were reported by different persons different time ago,
 So, we should reanalyze each of them and, possible, find better ways to
 solve them than it described at issues.
 
 P.s. IEP-7 [2] already contains 9 issues, feel free to mention something
 else :)
 
 [1] https://issues.apache.org/jira/browse/IGNITE-6961
 [2]
 
 https://cwiki.apache.org/confluence/display/IGNITE/IEP-
>>> 7%3A+Ignite+internal+problems+detection
 
>>> 
> 



[jira] [Created] (IGNITE-6980) Automatic cancelling of hanging Ignite operations

2017-11-21 Thread Denis Magda (JIRA)
Denis Magda created IGNITE-6980:
---

 Summary: Automatic cancelling of hanging Ignite operations
 Key: IGNITE-6980
 URL: https://issues.apache.org/jira/browse/IGNITE-6980
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Magda
Priority: Critical
 Fix For: 2.4


If an Ignite operation hangs due to some reason due to an internal problem or 
buggy application code it needs to eventual fail after a timeout fires.

Take atomic operations case brought by Val to our attention recently:
http://apache-ignite-developers.2346864.n4.nabble.com/Timeouts-in-atomic-cache-td19839.html

An application must not freeze waiting for a human being intervention if an 
atomic update fails internally.

Even more, I would let all possible operation to fail after a timeout fires:
- Ignite compute computations.
- Ignite services calls.
- Atomic/transactional cache updates.
- SQL queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6979) Race in GridClientPartitionTopology may cause NPE and partition map exchange hang.

2017-11-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6979:
-

 Summary: Race in GridClientPartitionTopology may cause NPE and 
partition map exchange hang.
 Key: IGNITE-6979
 URL: https://issues.apache.org/jira/browse/IGNITE-6979
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


Was reproduced on TC and locally using test
{{org.apache.ignite.internal.processors.cache.IgniteCachePartitionMapUpdateTest#testRandom}}

Reason: discoCache is not initalized before calling  
{{org.apache.ignite.internal.processors.cache.distributed.dht.GridClientPartitionTopology#nodes(int,
 org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion, 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionState,
 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionState...)}}

{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridClientPartitionTopology.nodes(GridClientPartitionTopology.java:538)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridClientPartitionTopology.owners(GridClientPartitionTopology.java:577)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridClientPartitionTopology.owners(GridClientPartitionTopology.java:582)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$22.applyx(CacheAffinitySharedManager.java:2117)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$22.applyx(CacheAffinitySharedManager.java:2043)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1059)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initAffinityOnNodeLeft0(CacheAffinitySharedManager.java:2043)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onServerLeftWithExchangeMergeProtocol(CacheAffinitySharedManager.java:1383)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:2239)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2199)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:1936)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:116)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1793)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1781)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1781)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1483)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:131)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:327)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:307)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2626)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2605)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
or

Re: Data eviction/expiration from Ignite persistence

2017-11-21 Thread Dmitry Pavlov
Hi Denis,

Second fix we need to do is B+ tree separation in per-partition basis:
https://issues.apache.org/jira/browse/IGNITE-5874

Should we take into account compatibilty issues with previous Ignite
persistent store versions, because current TTL tree is persisted, and will
change its format?

Sincerely,
Dmitriy Pavlov


вт, 21 нояб. 2017 г. в 2:13, Denis Magda :

> Dmitriy,
>
> That’s about TTL and eviction support for Ignite persistence. Presently if
> you set an expiration or eviction policy for a cache it will be applied for
> data stored in memory. The policy never affects the persistence layer.
>
> —
> Denis
>
> > On Nov 20, 2017, at 9:29 AM, Dmitry Pavlov 
> wrote:
> >
> > Hi Denis,
> >
> > Is this need covered by PDS + TTL?
> >
> > For the very first TTL test, I found some delay after applying TTL with
> the
> > repository enabled: https://issues.apache.org/jira/browse/IGNITE-6964
> >
> > And I'm wondering if the user's needs are covered by
> > https://apacheignite.readme.io/docs/expiry-policies plus
> > https://apacheignite.readme.io/docs/distributed-persistent-store
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > сб, 18 нояб. 2017 г. в 12:12, Dmitry Pavlov :
> >
> >> Hi Denis,
> >>
> >> What is the difference of required by users functionality with TTL cache
> >> expiration?
> >>
> >> By some posts I can suppose TTL cache is compatible with native
> >> persistence.
> >>
> >> Sincerely,
> >> Dmitriy Pavlov
> >>
> >> сб, 18 нояб. 2017 г. в 0:41, Denis Magda :
> >>
> >>> Igniters,
> >>>
> >>> I’ve been talking to many Ignite users here and there who are already
> on
> >>> Ignite persistence or consider to turn it on. The majority of them are
> more
> >>> than satisfied with its current state and provided capabilities.
> That’s is
> >>> really good news for us.
> >>>
> >>> However, I tend to come across the people who ask about
> >>> eviction/expiration policies for the persistence itself. Had around 6
> >>> conversation about the topic this month only.
> >>>
> >>> Usually the requirement is connected with a streaming use case. When an
> >>> application streams a lot of data (IoT, metrics, etc.) to the cluster
> but
> >>> the data becomes stale in some period of time (day, couple of days,
> etc.).
> >>> The user doesn’t want to waste the disk space and needs to simple
> purge the
> >>> data from there.
> >>>
> >>> My suggestion here is to create a timer task that will remove the stale
> >>> data from the cluster. However, since the demand is growing probably
> it’s a
> >>> good time to discuss a feasibility of this feature.
> >>>
> >>> Alex G, as the main architect of the persistence, could you share your
> >>> thoughts on this? What will it cost to us to support
> eviction/expiration
> >>> for the persistence?
> >>>
> >>> —
> >>> Denis
> >>
> >>
>
>


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Anton Vinogradov
Andrey,

>  JVM provides sufficient means of detecting a struggling process out of
the box.

Could you point to some articles describing how to detect STW exceeding
some duration using only JVM API?

On Tue, Nov 21, 2017 at 7:17 PM, Andrey Kornev 
wrote:

> My 2 cents. Don’t do it. JVM provides sufficient means of detecting a
> struggling process out of the box. SRE/Operations teams usually know how to
> monitor JVMs and can handle killing of such processes themselves.
>
> The feature adds no value, just complexity (and more configuration
> parameters (!) — as if Ignite didn’t have enough of them already).
>
> Regards,
> Andrey
> _
> From: Denis Magda 
> Sent: Monday, November 20, 2017 3:10 PM
> Subject: Re: Facility to detect long STW pauses and other system response
> degradations
> To: 
>
>
> My 2 cents.
>
> 1. Totally for a separate native process that will handle the monitoring
> of an Ignite process. The watchdog process can simply start a JVM tool like
> jstat and parse its GC logs: https://dzone.com/articles/
> how-monitor-java-garbage  how-monitor-java-garbage>
>
> 2. As for the STW handling, I would make a possible reaction more generic.
> Let’s define a policy (enumeration) that will define how to deal with an
> unstable node. The events might be as follows - kill a node, restart a
> node, trigger a custom script using Runtime.exec or other methods.
>
> What’d you think? Specifically on point 2.
>
> —
> Denis
>
> > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov 
> wrote:
> >
> > Yakov,
> >
> > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> >
> > We split issue to
> > #1 STW duration metrics
> > #2 External monitoring allows to stop node during STW
> >
> >> Testing GC pause with java thread is
> >> a bit strange and can give info only after GC pause finishes.
> >
> > That's ok since it's #1
> >
> > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> sbt.sorokin@gmail.com>
> > wrote:
> >
> >> I have tested solution with java-thread and GC logs had contain same
> pause
> >> values of thread stopping which was detected by java-thread.
> >>
> >>
> >> My log (contains pauses > 100ms):
> >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long STW
> >> pause: 507 milliseconds.
> >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long STW
> >> pause: 5595 milliseconds.
> >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long STW
> >> pause: 3262 milliseconds.
> >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long STW
> >> pause: 1737 milliseconds.
> >>
> >> GC log:
> >> gridgain@dell-5580-92zc8h2:~$ cat
> >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
> >> threads were stopped: 0,845 seconds, Stopping threads took:
> 0,246
> >> seconds
> >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
> >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> 0,252
> >> seconds
> >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
> >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> 0,178
> >> seconds // GOT!
> >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
> >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> 0,229
> >> seconds // GOT!
> >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which application
> >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> 0,223
> >> seconds // GOT!
> >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which application
> >> threads were stopped: 1,7337123 seconds, Stopping threads took:
> 0,121
> >> seconds // GOT!
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >>
>
>
>
>


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Andrey Kornev
My 2 cents. Don’t do it. JVM provides sufficient means of detecting a 
struggling process out of the box. SRE/Operations teams usually know how to 
monitor JVMs and can handle killing of such processes themselves.

The feature adds no value, just complexity (and more configuration parameters 
(!) — as if Ignite didn’t have enough of them already).

Regards,
Andrey
_
From: Denis Magda 
Sent: Monday, November 20, 2017 3:10 PM
Subject: Re: Facility to detect long STW pauses and other system response 
degradations
To: 


My 2 cents.

1. Totally for a separate native process that will handle the monitoring of an 
Ignite process. The watchdog process can simply start a JVM tool like jstat and 
parse its GC logs: https://dzone.com/articles/how-monitor-java-garbage 


2. As for the STW handling, I would make a possible reaction more generic. 
Let’s define a policy (enumeration) that will define how to deal with an 
unstable node. The events might be as follows - kill a node, restart a node, 
trigger a custom script using Runtime.exec or other methods.

What’d you think? Specifically on point 2.

—
Denis

> On Nov 20, 2017, at 6:47 AM, Anton Vinogradov  
> wrote:
>
> Yakov,
>
> Issue is https://issues.apache.org/jira/browse/IGNITE-6171
>
> We split issue to
> #1 STW duration metrics
> #2 External monitoring allows to stop node during STW
>
>> Testing GC pause with java thread is
>> a bit strange and can give info only after GC pause finishes.
>
> That's ok since it's #1
>
> On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin 
> wrote:
>
>> I have tested solution with java-thread and GC logs had contain same pause
>> values of thread stopping which was detected by java-thread.
>>
>>
>> My log (contains pauses > 100ms):
>> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long STW
>> pause: 507 milliseconds.
>> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long STW
>> pause: 5595 milliseconds.
>> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long STW
>> pause: 3262 milliseconds.
>> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long STW
>> pause: 1737 milliseconds.
>>
>> GC log:
>> gridgain@dell-5580-92zc8h2:~$ cat
>> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
>> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
>> threads were stopped: 0,845 seconds, Stopping threads took: 0,246
>> seconds
>> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
>> threads were stopped: 0,0001072 seconds, Stopping threads took: 0,252
>> seconds
>> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
>> threads were stopped: 0,5001082 seconds, Stopping threads took: 0,178
>> seconds // GOT!
>> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
>> threads were stopped: 5,5856603 seconds, Stopping threads took: 0,229
>> seconds // GOT!
>> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which application
>> threads were stopped: 3,2595700 seconds, Stopping threads took: 0,223
>> seconds // GOT!
>> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which application
>> threads were stopped: 1,7337123 seconds, Stopping threads took: 0,121
>> seconds // GOT!
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>>





Re: SQL warning for partitioned caches with setLocal

2017-11-21 Thread luqmanahmad
Thanks dsetrakyan,

I would like to add a few more things over here which should be applicable
to partitioned caches.

This context variable which is set through affinityCall or affinityRun
should be available through either a helper class or cache configuration.
There could be other advantages as well for example:

1. We can check the context variable in all the partitioned cache
operations. In department and employee example if an employee is accessed
without an affinityRun or affinityCall computation it should also log a
WARNING message or through an exception based on the cache configuration.

2. The user would be able to implement their own custom checks using it. For
example, if we want to have some abstract level checks to restrict
developers to use specific functionality related to partitioned caches. 

Luqman



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


[jira] [Created] (IGNITE-6978) Failed to move temp file to a regular WAL segment file

2017-11-21 Thread Oleg Ostanin (JIRA)
Oleg Ostanin created IGNITE-6978:


 Summary: Failed to move temp file to a regular WAL segment file
 Key: IGNITE-6978
 URL: https://issues.apache.org/jira/browse/IGNITE-6978
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
 Environment: CentOS
Reporter: Oleg Ostanin


1. I've started 2 server nodes on 2 hosts.
2. Started 2 client nodes, one on each host
3. Started DataStreamer from the first client with a key range 0-100 and 
from the second client with a key range 100-200

Then got this exception: 
Failed to set initial value for cache entry: DataStreamerEntry 
[key=KeyCacheObjectImpl [part=157, val=190623, hasValBytes=true], 
val=o.a.i.scenario.internal.model.SampleObject [idHash=2108267055, 
hash=2007824273, salary=1000, fields=HashMap {field19=aupygsskxq, 
field17=vghghwpdkk, field18=wapmsogviv, field22=yhsgrgxjvt, field23=bkuzgwohlp, 
field20=mzcimhrkwl, field21=bkdlrjeosd, field26=wvlypybaop, field27=wqmetfzsdm, 
field24=vpmmxinygq, field25=idbcqlchvq, field11=zkuxmemury, field12=otkuigrzqj, 
field10=uvghlcvwlx, field15=gaootfgcis, field16=abwxazoyoa, field13=flgmuzijzh, 
field14=vzsmgclizh, field39=iqhielhnon, field44=joadulhoxf, field45=bwqqkumjgf, 
field42=epwlotiwbv, field43=cvlehgeyar, field48=gnjawjgbrp, field49=ptfzndiiqm, 
field46=dkmbtdsrcr, field47=smugvczqkk, field40=kgozmlfenp, field41=bxtvofscdp, 
field28=enfjjtysvt, field29=kbzlsguqcb, field33=mbixfddhsq, field34=rygvisgdbi, 
field1=qriiuymvwe, field31=hdqfmkyofe, field0=comhcshciq, field32=lwroifzwfa, 
field37=gnooplphem, field38=zembqqqnzm, field35=pbpgfjvmhs, field36=eqbvpwenrd, 
field7=ymzwgutylc, field6=slnusxjggw, field9=psfzikqbyg, field8=exrrvedqvo, 
field3=dcilcjrprt, field2=yozawivutp, field30=ptreqavpui, field5=kxacmhbusc, 
field4=yxymejnvos}]]
org.apache.ignite.IgniteCheckedException: Failed to move temp file to a regular 
WAL segment file: 
/storage/ssd/oostanin/poc-tester/work/db/wal/poc_tester_server1/0005.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.createFile(FileWriteAheadLogManager.java:987)
 ~[ignite-core-2.1.6.jar:2.1.6]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkFiles(FileWriteAheadLogManager.java:1432)
 ~[ignite-core-2.1.6.jar:2.1.6]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$4900(FileWriteAheadLogManager.java:89)
 ~[ignite-core-2.1.6.jar:2.1.6]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.allocateRemainingFiles(FileWriteAheadLogManager.java:1402)
 ~[ignite-core-2.1.6.jar:2.1.6]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.run(FileWriteAheadLogManager.java:1187)
 ~[ignite-core-2.1.6.jar:2.1.6]
Caused by: java.nio.file.FileAlreadyExistsException: 
/storage/ssd/oostanin/poc-tester/work/db/wal/poc_tester_server1/0005.wal
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:429) ~[?:1.8.0_151]
at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) 
~[?:1.8.0_151]
at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_151]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.createFile(FileWriteAheadLogManager.java:984)
 ~[ignite-core-2.1.6.jar:2.1.6]
... 4 more

 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] ignite pull request #3077: IGNITE-2766 Ensure that cache is available after ...

2017-11-21 Thread alamar
GitHub user alamar opened a pull request:

https://github.com/apache/ignite/pull/3077

IGNITE-2766 Ensure that cache is available after client ID changes.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-2766test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3077.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3077






---


Re: Losing data during restarting cluster with persistence enabled

2017-11-21 Thread Evgeniy Ignatiev

Hello.
As far as I know data is always passed to the cache store on the same 
node it is being written in case of TRANSACTIONAL cache, to make cache 
store transaction-aware, unless write-behind mode is enabled (making the 
cache store effectively not participating in the actual txs that wrote 
the data), where data will always be passed to the cache store on the 
primary nodes. ATOMIC caches also write data to the cache store on 
primary nodes.


In case of transactional cache writing all data using inside affinity 
calls may solve the problem of writing data to the cache store only on 
primary or backup nodes.


On 11/21/2017 4:37 PM, Vyacheslav Daradur wrote:

Valentin,


Why don't you use Ignite persistence [1]?

I have a use case for one of the projects that need the RAM on disk
replication only. All PDS features aren't needed.
During the first assessment, persist to RocksDB works faster.


CacheStore design assumes that the underlying storage is shared by all the 
nodes in topology.

This is the very important note.
I'm a bit confused because I've thought that each node in cluster
persists partitions for which the node is either primary or backup
like in PDS.

My RocksDB implementation supports working with one DB instance which
shared by all the nodes in the topology, but it would make no sense of
using embedded fast storage.

Is there any link to a detailed description of CacheStorage design or
any other advice?
Thanks in advance.



On Fri, Nov 17, 2017 at 9:07 PM, Valentin Kulichenko
 wrote:

Vyacheslav,

CacheStore design assumes that the underlying storage is shared by all the
nodes in topology. Even if you delay rebalancing on node stop (which is
possible via CacheConfiguration#rebalanceDelay), I doubt it will solve all
your consistency issues.

Why don't you use Ignite persistence [1]?

[1] https://apacheignite.readme.io/docs/distributed-persistent-store

-Val

On Fri, Nov 17, 2017 at 4:24 AM, Vyacheslav Daradur 
wrote:


Hi Andrey! Thank you for answering.


Key to partition mapping shouldn't depends on topology, and shouldn't

changed unstable topology.
Key to partition mapping doesn't depend on topology in my test
affinity function. It only depends on partitions number.
But partition to node mapping depends on topology and at cluster stop,
when one node left topology, some partitions may be moved to other
nodes.


Does all nodes share same RockDB database or each node has its own copy?

Each Ignite node has own RocksDB instance.


Would you please share configuration?

It's pretty simple:
 IgniteConfiguration cfg = new IgniteConfiguration();
 cfg.setIgniteInstanceName(instanceName);

 CacheConfiguration cacheCfg = new
CacheConfiguration<>();
 cacheCfg.setName(TEST_CACHE_NAME);
 cacheCfg.setCacheMode(CacheMode.PARTITIONED);
 cacheCfg.setWriteSynchronizationMode(
CacheWriteSynchronizationMode.PRIMARY_SYNC);
 cacheCfg.setBackups(1);
 cacheCfg.setAffinity(new
TestAffinityFunction(partitionsNumber, backupsNumber));
 cacheCfg.setWriteThrough(true);
 cacheCfg.setReadThrough(true);
 cacheCfg.setRebalanceMode(CacheRebalanceMode.SYNC);
 cacheCfg.setCacheStoreFactory(new
RocksDBCacheStoreFactory<>("/test/path/to/persistence",
TEST_CACHE_NAME, cfg));

 cfg.setCacheConfiguration(cacheCfg);

Could you give me advice on places which I need to pay attention?


On Wed, Nov 15, 2017 at 3:02 PM, Andrey Mashenkov
 wrote:

Hi Vyacheslav,

Key to partition mapping shouldn't depends on topology, and shouldn't
changed unstable topology.
Looks like you've missed smth.

Would you please share configuration?
Does all nodes share same RockDB database or each node has its own copy?



On Wed, Nov 15, 2017 at 12:22 AM, Vyacheslav Daradur <

daradu...@gmail.com>

wrote:


Hi, Igniters!

I’m using partitioned Ignite cache with RocksDB as 3rd party persistence
store.
I've got an issue: if cache rebalancing is switched on, then it’s
possible to lose some data.

Basic scenario:
1) Start Ignite cluster and fill a cache with RocksDB persistence;
2) Stop all nodes
3) Start Ignite cluster and validate data

This works fine while rebalancing is switched off.

If rebalancing switched on: when I call Ignition#stopAll, some nodes
go down sequentially and while one node having gone down another start
rebalancing. When nodes started affinity function works with a full
set of nodes and may define a wrong partition for a key because the
previous state was changed at rebalancing.

Maybe I'm doing something wrong. How can I avoid rebalancing while
stopping all nodes in the cluster?

Could you give me any advice, please?

--
Best Regards, Vyacheslav D.




--
Best regards,
Andrey V. Mashenkov



--
Best Regards, Vyacheslav D.








[jira] [Created] (IGNITE-6977) Wrong initial BitSet size in GridPartitionStateMap

2017-11-21 Thread Alexander Belyak (JIRA)
Alexander Belyak created IGNITE-6977:


 Summary: Wrong initial BitSet size in GridPartitionStateMap
 Key: IGNITE-6977
 URL: https://issues.apache.org/jira/browse/IGNITE-6977
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.1
Reporter: Alexander Belyak


In constructor of org.apache.ignite.internal.utilGridPartitionStateMap(int 
parts) {
states = new BitSet(parts);
}
we initialize BitSet with part bit, but use private static final int BITS for 
each partition state. As result long[] in BitSet get difficult predictable size 
(depends of access order it can be exact as needed or almost twice bigger with 
at least one additional array copying)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6976) Visor CMD: Add ability to put/get/remove data to caches via command line Visor.

2017-11-21 Thread Alexey Kuznetsov (JIRA)
Alexey Kuznetsov created IGNITE-6976:


 Summary: Visor CMD: Add ability to put/get/remove data to caches 
via command line Visor.
 Key: IGNITE-6976
 URL: https://issues.apache.org/jira/browse/IGNITE-6976
 Project: Ignite
  Issue Type: Improvement
  Components: wizards
Reporter: Alexey Kuznetsov
Assignee: Alexey Kuznetsov
 Fix For: 2.4






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6975) .NET: Create cross-platform examples on .NET Core

2017-11-21 Thread Pavel Tupitsyn (JIRA)
Pavel Tupitsyn created IGNITE-6975:
--

 Summary: .NET: Create cross-platform examples on .NET Core
 Key: IGNITE-6975
 URL: https://issues.apache.org/jira/browse/IGNITE-6975
 Project: Ignite
  Issue Type: Improvement
  Components: examples, platforms
Affects Versions: 2.4
Reporter: Pavel Tupitsyn


IGNITE-2662 brings .NET Core based cross-platform support. We should provide 
examples that can be run on any platform (Windows / Linux / Mac).

Existing examples should be kept for .NET 4.0 / VS 2010 support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] ignite pull request #3076: IGNITE-6171: + LongJVMPauseDetector, + longJVMPau...

2017-11-21 Thread x-kreator
GitHub user x-kreator opened a pull request:

https://github.com/apache/ignite/pull/3076

IGNITE-6171: + LongJVMPauseDetector, + longJVMPausesCount and longJVM…

…PausesTotalDuration properties of IgniteMXBean.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/x-kreator/ignite ignite-6171

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3076


commit 37283985ff4a52d28691762fec1e99cc828ec762
Author: Unknown 
Date:   2017-11-21T13:38:29Z

IGNITE-6171: + LongJVMPauseDetector, + longJVMPausesCount and 
longJVMPausesTotalDuration properties of IgniteMXBean.




---


[GitHub] ignite pull request #3075: no BaselineTopology for in-memory-only grid

2017-11-21 Thread sergey-chugunov-1985
GitHub user sergey-chugunov-1985 opened a pull request:

https://github.com/apache/ignite/pull/3075

no BaselineTopology for in-memory-only grid



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite 
ignite-2.4.1-no_blt_for_inmemory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3075


commit e7ca9b65a68de7752195c8f4d2b5180f3c77d19f
Author: Dmitriy Govorukhin 
Date:   2017-11-13T18:52:47Z

ignite-blt-merge -> ignite-2.4.1

commit cc8168fc184bb7f5e3cc3bbb0743397097f78bfb
Author: Dmitriy Govorukhin 
Date:   2017-11-13T19:13:01Z

merge ignite-pitr-rc1 -> ignite-2.4.1

commit 87e6d74cf6a251c7984f9e68c391f790feccc281
Author: Dmitriy Govorukhin 
Date:   2017-11-14T12:49:33Z

ignite-gg-12877 Compact consistent ID in WAL

commit 9f5a22711baea05bd37ab07c8f928a4837dd83a4
Author: Ilya Lantukh 
Date:   2017-11-14T14:12:28Z

Fixed javadoc.

commit d5af2d78dd8eef8eca8ac5391d31d8c779649bb0
Author: Alexey Kuznetsov 
Date:   2017-11-15T08:09:00Z

IGNITE-6913 Baseline: Added new options to controls.sh for baseline 
manipulations.

commit 713924ce865752b6e99b03bd624136541cea5f9f
Author: Sergey Chugunov 
Date:   2017-11-15T09:03:12Z

IGNITE-5850 failover tests for cache operations during BaselineTopology 
changes

commit b65fd134e748d496f732ec2aa0953a0531f544b8
Author: Ilya Lantukh 
Date:   2017-11-15T12:54:35Z

TX read logging if PITR is enabled.

commit 9b2a567c0e04dc33116b51f88bee75f76e9107d1
Author: Ilya Lantukh 
Date:   2017-11-15T13:45:16Z

TX read logging if PITR is enabled.

commit 993058ccf0b2b8d9e80750c3e45a9ffa31d85dfa
Author: Dmitriy Govorukhin 
Date:   2017-11-15T13:51:54Z

ignite-2.4.1 optimization for store full set node more compacted

commit 1eba521f608d39967aec376b397b7fc800234e54
Author: Dmitriy Govorukhin 
Date:   2017-11-15T13:52:22Z

Merge remote-tracking branch 'professional/ignite-2.4.1' into ignite-2.4.1

commit 564b3fd51f8a7d1d81cb6874df66d0270623049c
Author: Sergey Chugunov 
Date:   2017-11-15T14:00:51Z

IGNITE-5850 fixed issue with initialization of data regions on node 
activation, fixed issue with auto-activation when random node joins inactive 
cluster with existing BLT

commit c6d1fa4da7adfadc80abdc7eaf6452b86a4f6aa4
Author: Sergey Chugunov 
Date:   2017-11-15T16:23:08Z

IGNITE-5850 transitionResult is set earlier when request for changing 
BaselineTopology is sent

commit d65674363163e38a4c5fdd73d1c8d8e1c7610797
Author: Sergey Chugunov 
Date:   2017-11-16T11:59:07Z

IGNITE-5850 new failover tests for changing BaselineTopology up (new node 
added to topology)

commit 20552f3851fe8825191b144179be032965e0b5c6
Author: Sergey Chugunov 
Date:   2017-11-16T12:53:43Z

IGNITE-5850 improved error message when online node is removed from baseline

commit 108bbcae4505ac904a6db774643ad600bfb42c21
Author: Sergey Chugunov 
Date:   2017-11-16T13:45:52Z

IGNITE-5850 BaselineTopology should not change on cluster deactivation

commit deb641ad3bdbf260fa60ad6bf607629652e324bd
Author: Dmitriy Govorukhin 
Date:   2017-11-17T09:45:44Z

ignite-2.4.1 truncate wal and checkpoint history on move/delete snapshot

commit 3c8b06f3659af30d1fd148ccc0f40e216a56c998
Author: Alexey Goncharuk 
Date:   2017-11-17T12:48:12Z

IGNITE-6947 Abandon remap after single map if future is done (fixes NPE)

commit ba2047e5ae7d271a677e0c418375d82d78c4023e
Author: devozerov 
Date:   2017-11-14T12:26:31Z

IGNITE-6901: Fixed assertion during 
IgniteH2Indexing.rebuildIndexesFromHash. This closes #3027.

commit abfc0466d6d61d87255d0fe38cbdf11ad46d4f89
Author: Sergey Chugunov 
Date:   2017-11-17T13:40:57Z

IGNITE-5850 tests for queries in presence of BaselineTopology

commit f4eabaf2a905abacc4c60c01d3ca04f6ca9ec188
Author: Sergey Chugunov 
Date:   2017-11-17T17:23:02Z

IGNITE-5850 implementation for setBaselineTopology(long topVer) migrated 
from wc-251

commit 4edeccd3e0b671aa277f58995df9ff9935baa95a
Author: EdShangGG 
Date:   2017-11-17T18:21:17Z

GG-13074 Multiple snapshot test failures after baseline topology is 
introduced
-adding baseline test to suite
-fixing issues with baseline

commit edae228c8f55990c15ef3044be987dcb00d6c81a
Author: EdShangGG 
Date:   2017-11-18T10:36:41Z

hack with sleep

commit b5bffc7580a4a8ffbcc06f60c282e73979179578
Author: Ilya Lantukh 
Date:   2017-11-18T12:39:19Z

Fixed Ignite.active(true) returning control too early.

commit 1bcdd76aae78665e2bbd49034fb46a1b91ef8389
Author: Ilya Lantukh 
Date:   2017-11-18T13:33:01Z

Fixed baseline topology changes from client/daemon nodes.

commit e3bbecd9f133251818a4b43afa44f46e66dd0325
Author: Alexey Goncharuk 
Date:   2017-11-18T14:16:39Z

Fixed licenses

commit b0

Re: Losing data during restarting cluster with persistence enabled

2017-11-21 Thread Vyacheslav Daradur
Valentin,

>> Why don't you use Ignite persistence [1]?
I have a use case for one of the projects that need the RAM on disk
replication only. All PDS features aren't needed.
During the first assessment, persist to RocksDB works faster.

>> CacheStore design assumes that the underlying storage is shared by all the 
>> nodes in topology.
This is the very important note.
I'm a bit confused because I've thought that each node in cluster
persists partitions for which the node is either primary or backup
like in PDS.

My RocksDB implementation supports working with one DB instance which
shared by all the nodes in the topology, but it would make no sense of
using embedded fast storage.

Is there any link to a detailed description of CacheStorage design or
any other advice?
Thanks in advance.



On Fri, Nov 17, 2017 at 9:07 PM, Valentin Kulichenko
 wrote:
> Vyacheslav,
>
> CacheStore design assumes that the underlying storage is shared by all the
> nodes in topology. Even if you delay rebalancing on node stop (which is
> possible via CacheConfiguration#rebalanceDelay), I doubt it will solve all
> your consistency issues.
>
> Why don't you use Ignite persistence [1]?
>
> [1] https://apacheignite.readme.io/docs/distributed-persistent-store
>
> -Val
>
> On Fri, Nov 17, 2017 at 4:24 AM, Vyacheslav Daradur 
> wrote:
>
>> Hi Andrey! Thank you for answering.
>>
>> >> Key to partition mapping shouldn't depends on topology, and shouldn't
>> changed unstable topology.
>> Key to partition mapping doesn't depend on topology in my test
>> affinity function. It only depends on partitions number.
>> But partition to node mapping depends on topology and at cluster stop,
>> when one node left topology, some partitions may be moved to other
>> nodes.
>>
>> >> Does all nodes share same RockDB database or each node has its own copy?
>> Each Ignite node has own RocksDB instance.
>>
>> >> Would you please share configuration?
>> It's pretty simple:
>> IgniteConfiguration cfg = new IgniteConfiguration();
>> cfg.setIgniteInstanceName(instanceName);
>>
>> CacheConfiguration cacheCfg = new
>> CacheConfiguration<>();
>> cacheCfg.setName(TEST_CACHE_NAME);
>> cacheCfg.setCacheMode(CacheMode.PARTITIONED);
>> cacheCfg.setWriteSynchronizationMode(
>> CacheWriteSynchronizationMode.PRIMARY_SYNC);
>> cacheCfg.setBackups(1);
>> cacheCfg.setAffinity(new
>> TestAffinityFunction(partitionsNumber, backupsNumber));
>> cacheCfg.setWriteThrough(true);
>> cacheCfg.setReadThrough(true);
>> cacheCfg.setRebalanceMode(CacheRebalanceMode.SYNC);
>> cacheCfg.setCacheStoreFactory(new
>> RocksDBCacheStoreFactory<>("/test/path/to/persistence",
>> TEST_CACHE_NAME, cfg));
>>
>> cfg.setCacheConfiguration(cacheCfg);
>>
>> Could you give me advice on places which I need to pay attention?
>>
>>
>> On Wed, Nov 15, 2017 at 3:02 PM, Andrey Mashenkov
>>  wrote:
>> > Hi Vyacheslav,
>> >
>> > Key to partition mapping shouldn't depends on topology, and shouldn't
>> > changed unstable topology.
>> > Looks like you've missed smth.
>> >
>> > Would you please share configuration?
>> > Does all nodes share same RockDB database or each node has its own copy?
>> >
>> >
>> >
>> > On Wed, Nov 15, 2017 at 12:22 AM, Vyacheslav Daradur <
>> daradu...@gmail.com>
>> > wrote:
>> >
>> >> Hi, Igniters!
>> >>
>> >> I’m using partitioned Ignite cache with RocksDB as 3rd party persistence
>> >> store.
>> >> I've got an issue: if cache rebalancing is switched on, then it’s
>> >> possible to lose some data.
>> >>
>> >> Basic scenario:
>> >> 1) Start Ignite cluster and fill a cache with RocksDB persistence;
>> >> 2) Stop all nodes
>> >> 3) Start Ignite cluster and validate data
>> >>
>> >> This works fine while rebalancing is switched off.
>> >>
>> >> If rebalancing switched on: when I call Ignition#stopAll, some nodes
>> >> go down sequentially and while one node having gone down another start
>> >> rebalancing. When nodes started affinity function works with a full
>> >> set of nodes and may define a wrong partition for a key because the
>> >> previous state was changed at rebalancing.
>> >>
>> >> Maybe I'm doing something wrong. How can I avoid rebalancing while
>> >> stopping all nodes in the cluster?
>> >>
>> >> Could you give me any advice, please?
>> >>
>> >> --
>> >> Best Regards, Vyacheslav D.
>> >>
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Andrey V. Mashenkov
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>



-- 
Best Regards, Vyacheslav D.


[jira] [Created] (IGNITE-6974) .NET: consoleWrite error during application shutdown

2017-11-21 Thread Alexey Popov (JIRA)
Alexey Popov created IGNITE-6974:


 Summary: .NET: consoleWrite error during application shutdown
 Key: IGNITE-6974
 URL: https://issues.apache.org/jira/browse/IGNITE-6974
 Project: Ignite
  Issue Type: Bug
  Components: platforms
Reporter: Alexey Popov
Priority: Minor


from Gitter:

Hey all (again xD)! Using Apache Ignite .NET 2.3 I (sometimes) get an error 
when shutting down my application. The error is only observable on server nodes 
and not on every shutdown. Seems like a kind of race condition.
The application runs as windows service. The windows application event log 
shows the following error (see above) and a I get a hs_err_pid[PID].log like 
that (snip):
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackUtils.consoleWrite(Ljava/lang/String;Z)V+0
j  
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackGateway.consoleWrite(Ljava/lang/String;Z)V+2
j  
org.apache.ignite.internal.processors.platform.dotnet.PlatformDotNetConsoleStream.write([BII)V+18
j  java.io.PrintStream.write([BII)V+16
j  sun.nio.cs.StreamEncoder.writeBytes()V+120
j  sun.nio.cs.StreamEncoder.implFlushBuffer()V+11
j  sun.nio.cs.StreamEncoder.flushBuffer()V+15
j  java.io.OutputStreamWriter.flushBuffer()V+4
j  java.io.PrintStream.write(Ljava/lang/String;)V+27
j  java.io.PrintStream.print(Ljava/lang/String;)V+9
j  org.apache.ignite.internal.util.IgniteUtils.quiet(Z[Ljava/lang/Object;)V+126
j  org.apache.ignite.internal.IgniteKernal.stop0(Z)V+943
j  org.apache.ignite.internal.IgniteKernal.stop(Z)V+6
j  org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(Z)V+162
j  org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(Z)V+26
j  org.apache.ignite.internal.IgnitionEx.stop(Ljava/lang/String;ZZ)Z+72
j  org.apache.ignite.Ignition.stop(Ljava/lang/String;Z)Z+3
j  
org.apache.ignite.internal.processors.platform.PlatformIgnition.stop(Ljava/lang/String;Z)Z+2
v  ~StubRoutines::call_stub
For me it seems that the Java side wants to write something to the (.NET) 
console using a callback and the underlying memory is already freed - therefore 
we get a AccessViolation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Ignite ML dense distributed matrices

2017-11-21 Thread Yury Babak
Alexey,

I dont think that mixing distributed/local is a good idea unlike
dense/sparse or blocks/rows/cols.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Vladimir Ozerov
Honestly I do not understand why we need separate process to monitor node's
state. Is it possible to list advantages of this approach comparing to
in-proc monitoring threads?

On Tue, Nov 21, 2017 at 1:16 PM, Дмитрий Сорокин 
wrote:

> Don't forget that the high utilization of CPU can occur for reasons other
> than GC STW, and GC log parsing will not help us in that case.
>
>
> вт, 21 нояб. 2017 г. в 13:06, Anton Vinogradov [via Apache Ignite
> Developers] :
>
> > Denis,
> >
> > > 1. Totally for a separate native process that will handle the
> monitoring
> > of an Ignite process. The watchdog process can simply start a JVM tool
> > like
> > jstat and parse its GC logs: https://dzone.com/articles/
> > how-monitor-java-garbage  > how-monitor-java-garbage>
> > Different GC and even same GC at different OS/JVM produce different logs.
> > That's not easy to parse them. But, since http://gceasy.io can do that,
> > it
> > looks to be possible, somehow :) .
> > Do you know any libs or solutions allows to do this at realtime?
> >
> > > 2. As for the STW handling, I would make a possible reaction more
> > generic. Let’s define a policy (enumeration) that will define how to deal
> > with an unstable node. The events might be as follows - kill a node,
> > restart a node, trigger a custom script using Runtime.exec or other
> > methods.
> > Yes, it should be similar to segmentation policy + custom script
> > execution.
> >
> >
> > On Tue, Nov 21, 2017 at 2:10 AM, Denis Magda <[hidden email]
> > > wrote:
> >
> > > My 2 cents.
> > >
> > > 1. Totally for a separate native process that will handle the
> monitoring
> > > of an Ignite process. The watchdog process can simply start a JVM tool
> > like
> > > jstat and parse its GC logs: https://dzone.com/articles/
> > > how-monitor-java-garbage  > > how-monitor-java-garbage>
> > >
> > > 2. As for the STW handling, I would make a possible reaction more
> > generic.
> > > Let’s define a policy (enumeration) that will define how to deal with
> an
> > > unstable node. The events might be as follows - kill a node, restart a
> > > node, trigger a custom script using Runtime.exec or other methods.
> > >
> > > What’d you think? Specifically on point 2.
> > >
> > > —
> > > Denis
> > >
> > > > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov <[hidden email]
> > >
> > > wrote:
> > > >
> > > > Yakov,
> > > >
> > > > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> > > >
> > > > We split issue to
> > > > #1 STW duration metrics
> > > > #2 External monitoring allows to stop node during STW
> > > >
> > > >> Testing GC pause with java thread is
> > > >> a bit strange and can give info only after GC pause finishes.
> > > >
> > > > That's ok since it's #1
> > > >
> > > > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> > > [hidden email] >
> > > > wrote:
> > > >
> > > >> I have tested solution with java-thread and GC logs had contain same
> > > pause
> > > >> values of thread stopping which was detected by java-thread.
> > > >>
> > > >>
> > > >> My log (contains pauses > 100ms):
> > > >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long
> > STW
> > > >> pause: 507 milliseconds.
> > > >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long
> > STW
> > > >> pause: 5595 milliseconds.
> > > >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long
> > STW
> > > >> pause: 3262 milliseconds.
> > > >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long
> > STW
> > > >> pause: 1737 milliseconds.
> > > >>
> > > >> GC log:
> > > >> gridgain@dell-5580-92zc8h2:~$ cat
> > > >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> > > >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which
> application
> > > >> threads were stopped: 0,845 seconds, Stopping threads took:
> > > 0,246
> > > >> seconds
> > > >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which
> application
> > > >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> > > 0,252
> > > >> seconds
> > > >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which
> application
> > > >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> > > 0,178
> > > >> seconds// GOT!
> > > >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which
> application
> > > >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> > > 0,229
> > > >> seconds// GOT!
> > > >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which
> > application
> > > >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> > > 0,223
> > > >> seconds// GOT!
> > > >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which
> > application
> > > >> threads were stopped: 1,7337123 seconds, Stoppi

[jira] [Created] (IGNITE-6973) Node restarts with enabled persistence lead to affinity assignment mismatch on different nodes.

2017-11-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6973:
-

 Summary: Node restarts with enabled persistence lead to affinity 
assignment mismatch on different nodes.
 Key: IGNITE-6973
 URL: https://issues.apache.org/jira/browse/IGNITE-6973
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


Most probably this is caused by deploymentId reassign after grid restart.

All nodes must have same deploymentId in such case.

Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.persistence;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.CacheAtomicityMode;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.cluster.ClusterNode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.configuration.MemoryPolicyConfiguration;
import org.apache.ignite.configuration.PersistentStoreConfiguration;
import org.apache.ignite.configuration.WALMode;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.IgniteKernal;
import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
import org.apache.ignite.internal.processors.cache.IgniteInternalCache;
import org.apache.ignite.internal.util.typedef.G;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.lang.IgniteUuid;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.GridTestUtils;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheMode.PARTITIONED;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
import static 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.IGNITE_PDS_CHECKPOINT_TEST_SKIP_SYNC;

/**
 * The test validates assignment after nodes restart with enabled persistence.
 */
public class IgnitePdsCacheAssignmentNodeRestartsTest extends 
GridCommonAbstractTest {
/** */
private static TcpDiscoveryIpFinder ipFinder = new 
TcpDiscoveryVmIpFinder(true);

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setMemoryConfiguration(new 
MemoryConfiguration().setDefaultMemoryPolicyName("d").
setPageSize(1024).setMemoryPolicies(new 
MemoryPolicyConfiguration().setName("d").
setInitialSize(50 * 1024 * 1024L).setMaxSize(50 * 1024 * 
1024)));

cfg.setPersistentStoreConfiguration(new 
PersistentStoreConfiguration().setWalMode(WALMode.LOG_ONLY));

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(ipFinder);

return cfg;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), 
"db", false));
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
stopAllGrids();

deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), 
"db", false));

super.afterTest();
}


Right MXBean for new metrics

2017-11-21 Thread Alex Plehanov
Hello, Igniters!



I would like to discuss the implementation of ticket IGNITE-6871.



In our Ignite instance there are more than 1000 caches and about 10 cache
groups. To minimize the probability of data loss we need to alert when a
critical level of redundancy in cluster is reached. So, we need some
metric, which will count a minimal partition redundancy level for a cache
group.



Now there are no MXBeans for cache groups. And since cache groups were
introduced, some metrics from CacheMetricsMXBean actually show information
about the cache group, but not about the cache.



I can implement the new metric (minimal partition redundancy level for
cache group) in CacheMetricsMXBean, the same way it was before. In such
case we’ll whether need to monitor this metric for all caches or to get
somehow information about cache to cache group relation and to monitor this
metric for only one cache per cache group. But it’s not transparent to an
administrator which cache groups are existing and which caches belong to
which cache group.



Alternatively, I can implement a new type of MXBean for cache groups and
add a new metric to this MXBean. Maybe it will be useful later to add to
this MXBean some other cache group related metrics, which now are
implemented in CacheMetricsMXBean.



So, should I extend existing CacheMetricsMXBean or create a new type of
MXBeans for cache groups?


[jira] [Created] (IGNITE-6972) BinaryObjectBuilderImpl.build is stuck when server cluster is restarted

2017-11-21 Thread Jason Man (JIRA)
Jason Man created IGNITE-6972:
-

 Summary: BinaryObjectBuilderImpl.build is stuck when server 
cluster is restarted
 Key: IGNITE-6972
 URL: https://issues.apache.org/jira/browse/IGNITE-6972
 Project: Ignite
  Issue Type: Bug
  Components: binary
Affects Versions: 2.3
Reporter: Jason Man


When a client node is using a BinaryObjectBuilder to build a BinaryObject, the 
build() method could get stuck if the cluster is being restarted.

Thread dump of the stack is
{code}
"main" #1 prio=5 os_prio=0 tid=0x004d9000 nid=0x62ac waiting on condition 
[0x023bf000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:441)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:182)
at 
org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:793)
at 
org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:752)
at 
org.apache.ignite.internal.binary.BinaryContext.descriptorForClass(BinaryContext.java:623)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:164)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.builder.BinaryBuilderSerializer.writeValue(BinaryBuilderSerializer.java:207)
at 
org.apache.ignite.internal.binary.builder.BinaryValueWithType.writeTo(BinaryValueWithType.java:48)
at 
org.apache.ignite.internal.binary.builder.BinaryBuilderSerializer.writeValue(BinaryBuilderSerializer.java:73)
at 
org.apache.ignite.internal.binary.builder.BinaryBuilderSerializer.writeValue(BinaryBuilderSerializer.java:54)
at 
org.apache.ignite.internal.binary.builder.BinaryObjectBuilderImpl.serializeTo(BinaryObjectBuilderImpl.java:313)
at 
org.apache.ignite.internal.binary.builder.BinaryObjectBuilderImpl.build(BinaryObjectBuilderImpl.java:183)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] ignite pull request #3074: IGNITE-6904 SQL: partition reservations are relea...

2017-11-21 Thread dolphin1414
GitHub user dolphin1414 opened a pull request:

https://github.com/apache/ignite/pull/3074

IGNITE-6904 SQL: partition reservations are released too early in lazy mode

In lazy mode partitions reservations are released only after last page has 
been sent.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-6904

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3074


commit c28dda18e185692c93f722996a03c70daab19bcd
Author: rkondakov 
Date:   2017-11-21T10:09:12Z

IGNITE-6904: in lazy mode partitions reservations are released only after 
last page has been sent.

commit 7dda28980eb792392454583f26970e31eb1727eb
Author: rkondakov 
Date:   2017-11-21T10:11:29Z

Merge remote-tracking branch 'apache/master' into ignite-6904

commit 153de4033fe53407ac7db9615891798e47b1dc19
Author: rkondakov 
Date:   2017-11-21T10:12:13Z

Merge remote-tracking branch 'origin/master' into ignite-6904




---


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Дмитрий Сорокин
Don't forget that the high utilization of CPU can occur for reasons other
than GC STW, and GC log parsing will not help us in that case.


вт, 21 нояб. 2017 г. в 13:06, Anton Vinogradov [via Apache Ignite
Developers] :

> Denis,
>
> > 1. Totally for a separate native process that will handle the monitoring
> of an Ignite process. The watchdog process can simply start a JVM tool
> like
> jstat and parse its GC logs: https://dzone.com/articles/
> how-monitor-java-garbage  how-monitor-java-garbage>
> Different GC and even same GC at different OS/JVM produce different logs.
> That's not easy to parse them. But, since http://gceasy.io can do that,
> it
> looks to be possible, somehow :) .
> Do you know any libs or solutions allows to do this at realtime?
>
> > 2. As for the STW handling, I would make a possible reaction more
> generic. Let’s define a policy (enumeration) that will define how to deal
> with an unstable node. The events might be as follows - kill a node,
> restart a node, trigger a custom script using Runtime.exec or other
> methods.
> Yes, it should be similar to segmentation policy + custom script
> execution.
>
>
> On Tue, Nov 21, 2017 at 2:10 AM, Denis Magda <[hidden email]
> > wrote:
>
> > My 2 cents.
> >
> > 1. Totally for a separate native process that will handle the monitoring
> > of an Ignite process. The watchdog process can simply start a JVM tool
> like
> > jstat and parse its GC logs: https://dzone.com/articles/
> > how-monitor-java-garbage  > how-monitor-java-garbage>
> >
> > 2. As for the STW handling, I would make a possible reaction more
> generic.
> > Let’s define a policy (enumeration) that will define how to deal with an
> > unstable node. The events might be as follows - kill a node, restart a
> > node, trigger a custom script using Runtime.exec or other methods.
> >
> > What’d you think? Specifically on point 2.
> >
> > —
> > Denis
> >
> > > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov <[hidden email]
> >
> > wrote:
> > >
> > > Yakov,
> > >
> > > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> > >
> > > We split issue to
> > > #1 STW duration metrics
> > > #2 External monitoring allows to stop node during STW
> > >
> > >> Testing GC pause with java thread is
> > >> a bit strange and can give info only after GC pause finishes.
> > >
> > > That's ok since it's #1
> > >
> > > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> > [hidden email] >
> > > wrote:
> > >
> > >> I have tested solution with java-thread and GC logs had contain same
> > pause
> > >> values of thread stopping which was detected by java-thread.
> > >>
> > >>
> > >> My log (contains pauses > 100ms):
> > >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 507 milliseconds.
> > >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 5595 milliseconds.
> > >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 3262 milliseconds.
> > >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 1737 milliseconds.
> > >>
> > >> GC log:
> > >> gridgain@dell-5580-92zc8h2:~$ cat
> > >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> > >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
> > >> threads were stopped: 0,845 seconds, Stopping threads took:
> > 0,246
> > >> seconds
> > >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
> > >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> > 0,252
> > >> seconds
> > >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
> > >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> > 0,178
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
> > >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> > 0,229
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which
> application
> > >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> > 0,223
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which
> application
> > >> threads were stopped: 1,7337123 seconds, Stopping threads took:
> > 0,121
> > >> seconds// GOT!
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >>
> >
> >
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Facility-to-detect-long-STW-pauses-and-other-system-response-degradations-tp24391p24497.html
> To unsubscribe from Facility to detect long STW pauses and other 

Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Anton Vinogradov
Denis,

> 1. Totally for a separate native process that will handle the monitoring
of an Ignite process. The watchdog process can simply start a JVM tool like
jstat and parse its GC logs: https://dzone.com/articles/
how-monitor-java-garbage 
Different GC and even same GC at different OS/JVM produce different logs.
That's not easy to parse them. But, since http://gceasy.io can do that, it
looks to be possible, somehow :) .
Do you know any libs or solutions allows to do this at realtime?

> 2. As for the STW handling, I would make a possible reaction more
generic. Let’s define a policy (enumeration) that will define how to deal
with an unstable node. The events might be as follows - kill a node,
restart a node, trigger a custom script using Runtime.exec or other methods.
Yes, it should be similar to segmentation policy + custom script execution.


On Tue, Nov 21, 2017 at 2:10 AM, Denis Magda  wrote:

> My 2 cents.
>
> 1. Totally for a separate native process that will handle the monitoring
> of an Ignite process. The watchdog process can simply start a JVM tool like
> jstat and parse its GC logs: https://dzone.com/articles/
> how-monitor-java-garbage  how-monitor-java-garbage>
>
> 2. As for the STW handling, I would make a possible reaction more generic.
> Let’s define a policy (enumeration) that will define how to deal with an
> unstable node. The events might be as follows - kill a node, restart a
> node, trigger a custom script using Runtime.exec or other methods.
>
> What’d you think? Specifically on point 2.
>
> —
> Denis
>
> > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov 
> wrote:
> >
> > Yakov,
> >
> > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> >
> > We split issue to
> > #1 STW duration metrics
> > #2 External monitoring allows to stop node during STW
> >
> >> Testing GC pause with java thread is
> >> a bit strange and can give info only after GC pause finishes.
> >
> > That's ok since it's #1
> >
> > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> sbt.sorokin@gmail.com>
> > wrote:
> >
> >> I have tested solution with java-thread and GC logs had contain same
> pause
> >> values of thread stopping which was detected by java-thread.
> >>
> >>
> >> My log (contains pauses > 100ms):
> >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long STW
> >> pause: 507 milliseconds.
> >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long STW
> >> pause: 5595 milliseconds.
> >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long STW
> >> pause: 3262 milliseconds.
> >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long STW
> >> pause: 1737 milliseconds.
> >>
> >> GC log:
> >> gridgain@dell-5580-92zc8h2:~$ cat
> >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
> >> threads were stopped: 0,845 seconds, Stopping threads took:
> 0,246
> >> seconds
> >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
> >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> 0,252
> >> seconds
> >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
> >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> 0,178
> >> seconds// GOT!
> >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
> >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> 0,229
> >> seconds// GOT!
> >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which application
> >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> 0,223
> >> seconds// GOT!
> >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which application
> >> threads were stopped: 1,7337123 seconds, Stopping threads took:
> 0,121
> >> seconds// GOT!
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >>
>
>


Re: IGNITE-6745. Status

2017-11-21 Thread Anton Vinogradov
Cergey,

Some of Ignite users use extended Java7 support, so, it's still alive.

Yes, all code should be java7-compatible.

On Tue, Nov 21, 2017 at 2:28 AM, Cergey  wrote:

> My username is cossack5. Please grant me contributor permissions.
> As for java 7 version, until moment it is discontinued, all the code
> should be java7-compatible ?
>
> -Original Message-
> From: Denis Magda [mailto:dma...@apache.org]
> Sent: Tuesday, November 21, 2017 3:15 AM
> To: dev@ignite.apache.org
> Subject: Re: IGNITE-6745. Status
>
> Cergey,
>
> What’s you JIRA account? You need to be among Ignite contributors in JIRA
> to assign tickets on yourself.
>
> As for Java 7, yes, we had that discussion many times. Hopefully it will
> be discontinued the next year.
>
> However, as for Java 8 the community is willing to support it by the end
> of the year.
>
> —
> Denis
>
> > On Nov 20, 2017, at 1:45 PM, Cergey  wrote:
> >
> > Hi,
> > I can't assign the ticket to myself - seems I have no rights.
> > Also, I see we still support java 7. Maybe it's time to cease it
> (especially when we have java 9 to worry about) ?
> >
> > -Original Message-
> > From: Anton Vinogradov [mailto:avinogra...@gridgain.com]
> > Sent: Monday, November 20, 2017 2:01 PM
> > To: dev@ignite.apache.org
> > Subject: Re: IGNITE-6745. Status
> >
> > Cergey,
> >
> > Please assign https://issues.apache.org/jira/browse/IGNITE-6745 to
> yourself and change status to Patch Available.
> > Also, before asking review, please check that TeamCity status is ok,
> > see
> > https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#H
> > owtoContribute-SubmittingforReview
> > for details.
> >
> >
> > On Sat, Nov 18, 2017 at 12:25 AM, Denis Magda  wrote:
> >
> >> Igniters,
> >>
> >> Who is going to take a lead of Java 9 support and can do thorough
> >> review of all the related changes? Here is a set of the tickets and
> >> Cergey solved one of them:
> >> https://issues.apache.org/jira/browse/IGNITE-6728
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 16, 2017, at 3:12 PM, Cergey  wrote:
> >>>
> >>> Hi, igniters
> >>>
> >>>
> >>>
> >>> Why no one commented on the patch and pull request
> >>> (https://github.com/apache/ignite/pull/2970) ?  What should I do ?
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Cergey Chaulin
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>
>


Re: getAverageGetTime/getAveragePutTime APIs of CacheMetrics always return 0

2017-11-21 Thread Pavel Pereslegin
Hello all.

In addition, I found similar closed issue about cache metrics and
client mode [1].
The ticket resolution seems strange, because it can be reproduced now.

[1] https://issues.apache.org/jira/browse/IGNITE-3605

2017-11-21 2:47 GMT+03:00 Denis Magda :
> Ignite dev community,
>
> Bring this weird bug to your attention. Could you confirm it’s not a 
> “feature” of us? Put to the IEP-6 (metrics) list.
>
> —
> Denis
>
>> Begin forwarded message:
>>
>> From: Denis Magda 
>> Subject: Re: getAverageGetTime/getAveragePutTime APIs of CacheMetrics always 
>> return 0
>> Date: November 20, 2017 at 3:46:27 PM PST
>> To: u...@ignite.apache.org
>> Reply-To: u...@ignite.apache.org
>>
>> Eventually I could reproduce your issue:
>> https://issues.apache.org/jira/browse/IGNITE-6966 
>> 
>>
>> It will be fixed as a part of this endeavor:
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-6%3A+Metrics+improvements
>>  
>> 
>>
>> Thanks for your patience and support in reproducing the bug.
>>
>> —
>> Denis
>>
>>
>>> On Nov 20, 2017, at 4:23 AM, headstar >> > wrote:
>>>
>>> Thanks for the example! Works fine when running IgniteMetricsExample#main
>>> with the provided conf.
>>>
>>> One difference to "my" configuration was that I was running the node
>>> accessing the cache in client mode.
>>>
>>> If I start a node with the configuration provided in your example and then
>>> run IgniteMetricsExample#main in client mode the statistics are 0.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ 
>>> 
>>
>


[GitHub] ignite pull request #3052: IGNITE-6931

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3052


---


[GitHub] ignite pull request #3073: Ignite 2.1.8

2017-11-21 Thread apopovgg
GitHub user apopovgg opened a pull request:

https://github.com/apache/ignite/pull/3073

Ignite 2.1.8



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-2.1.8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3073


commit 6c45edeaf83f94a9c0539d9768dd8be0e25dc36e
Author: Pavel Tupitsyn 
Date:   2017-08-29T15:35:49Z

IGNITE-6178 .NET: Fix PersistentStoreConfiguration.CheckpointingThreads 
default value

This closes #2511

(cherry picked from commit 08a831f)

commit 3eed5e6f49dfcbc0b16b7fa71f71fe8fd2a7463f
Author: vsisko 
Date:   2017-08-30T09:43:17Z

IGNITE-6223 Fixed NPE when data is null on job fail.

commit b26f5f1027c1b9c13a6106ad3162551d666ef8a0
Author: Ilya Kasnacheev 
Date:   2017-08-30T13:20:45Z

IGNITE-6125: JDBC thick: several fixes to JdbcDatabaseMetadata. This closes 
#2506.

commit 01c627a0a36692a045a646d4d165820bb3b7cf04
Author: Ilya Lantukh 
Date:   2017-08-30T13:58:32Z

gg-12686 : Fixed null CacheMapHolder if node is restarted with partition in 
RENTING state.

Signed-off-by: Andrey Gura 

commit 2ef22f523f5162574a31d3c7eb45e8edb5e3f2b8
Author: tledkov-gridgain 
Date:   2017-08-31T08:03:18Z

IGNITE-5409: JDBC thin: added schema to connection URL. This closes #2179.

commit 67d5c6e45ba00a8b59289436c83f2d428b502d74
Author: Andrey Gura 
Date:   2017-08-31T12:04:30Z

Merge branch 'ignite-2.1.4' into ignite-2.1.5

commit e1cbe2b7358691e5f082b457723285c61d36c15b
Author: tledkov-gridgain 
Date:   2017-08-31T14:03:44Z

IGNITE-5425 JDBC thin driver: don't throw unsupported exception on 
ClientInfo methods. This closes #2184. This closes #2525.

commit 3a2eebe35e24391f4d61068af99246516f759d21
Author: devozerov 
Date:   2017-08-31T14:05:56Z

Merge remote-tracking branch 'upstream/ignite-2.1.5' into ignite-2.1.5

commit 6e69a92f979da89970bcdfaf4727569b28329764
Author: tledkov-gridgain 
Date:   2017-08-31T14:23:31Z

IGNITE-5462: JDBC thin driver: additional test for DDL/DML commands. This 
closes #2122.

commit ca13d0db21125c40d2f728ec6823c539b16a8344
Author: Denis Mekhanikov 
Date:   2017-08-31T07:32:25Z

ignite-6214 resolve problem with concurrent metadata updates

commit 748a6d21a0460c6a3208d218fd43478472379ca7
Author: Pavel Tupitsyn 
Date:   2017-09-01T08:15:23Z

IGNITE-6081 .NET: Fix PutAll for dependent objects

commit 5427e72f58c565031ea9b37f398309346819badb
Author: vsisko 
Date:   2017-08-22T10:51:55Z

IGNITE-6120 Lazy query execution from Web console.

commit 20a2a4f741cf7a70bfa8cfbe7a5297f5972f4c77
Author: Igor Sapego 
Date:   2017-09-01T13:24:04Z

IGNITE-6211: ODBC: SQLBindParameter now does not unbind parameter if the
ParameterValuePtr is NULL

(cherry picked from commit 72608a5)

commit 1958f3790363c88211d1b846f9831b695f541659
Author: Evgeny Stanilovskiy 
Date:   2017-09-01T12:46:57Z

IGNITE-5869 Client hangs in case of binary configuration compact footer of 
client node differs from server node

Signed-off-by: Andrey Gura 

commit eb14f224ce0facea3865f278ae79cddca8a58b91
Author: Andrey Gura 
Date:   2017-09-01T14:25:26Z

IGNITE-5869 Code review

commit f971066951d19b818607f2de879fd4ad4b5a7320
Author: tledkov-gridgain 
Date:   2017-09-04T10:19:50Z

IGNITE-6197: SQL: QueryIndex.setInlineSize now returns this instead of 
void. This closes #2567.

commit 7df8c826f6870a5eef3098fb706464f613b31e5a
Author: Roman Shtykh 
Date:   2017-09-04T10:35:34Z

IGNITE-6170: JDBC: now all drivers return "Apache Ignite" as product name. 
This closes #2514.

commit bdcbef0a611933ba2ba6f670eab429e7aa4ad490
Author: tledkov-gridgain 
Date:   2017-09-04T10:47:17Z

IGNITE-6220: JDBC thin driver: fixed long[] data type propagation. This 
closes #2563.

commit 96cdf426bf53f777a1214a6b618f8304d86d031a
Author: Ilya Kasnacheev 
Date:   2017-09-04T10:56:25Z

IGNITE-6139: JDBC thick driver: fixed driver versions. This closes #2552.

commit 6e608b881f05ab3503304ab73506163c840dc55d
Author: Igor Sapego 
Date:   2017-09-04T12:34:59Z

IGNITE-6188: ODBC: Fix for SQLFreeStmt(SQL_CLOSE).

(cherry picked from commit dc6b73b)

commit 7d0a7cc89c6e2920d3cc8f4e219aa980e59b071f
Author: Igor Sapego 
Date:   2017-09-04T13:20:20Z

IGNITE-3987: ODBC: Improved error output when query parsing failed.

(cherry picked from commit 29ce1fa)

commit a5f5a9932f598062e45f903c2f170301157ac00a
Author: Alexey Kukushkin 
Date:   2017-08-31T14:44:11Z

ignite-5966 IgniteCache#get() fails with "Requesting mapping from grid 
failed" when deserialising binary object loaded from CacheJdbcPojoStoreFactory

commit 5bb79244a3049efa2a455f8e0eb8b76fb916ce02
Author: Igor Seliverstov 
Date:   2017-09-01T09:56

[jira] [Created] (IGNITE-6971) Ignite Logger type & logging file config indication

2017-11-21 Thread Alexey Popov (JIRA)
Alexey Popov created IGNITE-6971:


 Summary: Ignite Logger type & logging file config indication
 Key: IGNITE-6971
 URL: https://issues.apache.org/jira/browse/IGNITE-6971
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.1
Reporter: Alexey Popov
Priority: Minor


Please see 
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-Logger-amp-logging-file-config-output-td24435.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Suggestion to improve deadlock detection

2017-11-21 Thread Dmitriy Setrakyan
On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov 
wrote:

> It doesn’t need all txes. Instead, other nodes will send info about
> suspicious txes to it from time to time.
>

I see your point, I think it might work.


[jira] [Created] (IGNITE-6970) Error thrown from CacheStore cause cache operation hanging.

2017-11-21 Thread Andrew Mashenkov (JIRA)
Andrew Mashenkov created IGNITE-6970:


 Summary: Error thrown from CacheStore cause cache operation 
hanging.
 Key: IGNITE-6970
 URL: https://issues.apache.org/jira/browse/IGNITE-6970
 Project: Ignite
  Issue Type: Bug
  Components: cache
Reporter: Andrew Mashenkov


If some error (e.g. NoSuchMethodError) was thrown from CacheStore 
implementation during simple cache.get(),
then operation hangs as server fails and never sends NearAtomicGetResponse to 
client
and never complete async future.

GridCacheAdapter.getAllAsync0 method failed on GridEmbeddedFuture creation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)