Re: [ANNOUNCE] Welcome Pavel Pereslegin as a new committer

2022-02-10 Thread Дмитрий Сорокин
Great job Pavel, my congrats!


Чт, 10 февр. 2022 г. в 18:29, Nikolay Izhikov :

> Great news! Congratulations!
>
> > 10 февр. 2022 г., в 18:26, Maksim Timonin 
> написал(а):
> >
> > Hi Pavel,
> >
> > Congratulations!
> >
> > On Thu, Feb 10, 2022 at 6:16 PM Maxim Muzafarov 
> wrote:
> >
> >> The Project Management Committee (PMC) for Apache Ignite has invited
> >> Pavel Pereslegin to become a committer and we are pleased to announce
> that
> >> he has accepted.
> >>
> >> He made a lot of major contributions to the Apache Ignite codebase
> >> like a snapshot restore procedure, batch update operation to the
> >> PageMemory, TDE cache key rotation procedure, Service context
> >> injection and etc.
> >>
> >> Being a committer enables easier contribution to the project since
> there is
> >> no need to go via the patch submission process. This should enable
> better
> >> productivity.
> >>
> >> Please join me in welcoming Pavel, and congratulating him on the new
> role
> >> in
> >> the Apache Ignite Community.
> >>
> >>
> >> Best Regards,
> >> Maxim Muzafarov
> >>
>
>


Re: [ANNOUNCE] New PMC member: Maxim Muzafarov

2020-05-07 Thread Дмитрий Сорокин
Maxim, congrats!
Great job, respect!

чт, 7 мая 2020 г. в 23:04, Vyacheslav Daradur :

> Great job! My congratulations!
>
> чт, 7 мая 2020 г. в 16:09, Alex Plehanov :
>
> > Maxim, congratulations!
> >
> >
> > чт, 7 мая 2020 г. в 15:12, Denis Garus :
> >
> > > Maxim, Congrats!
> > > Great job!
> > >
> > > чт, 7 мая 2020 г. в 15:02, Nikita Amelchev :
> > >
> > > > Maxim, congrats!
> > > >
> > > > чт, 7 мая 2020 г. в 14:55, Nikolay Izhikov :
> > > > >
> > > > > Congrats.
> > > > >
> > > > > > 7 мая 2020 г., в 14:54, Ivan Pavlukhin 
> > > > написал(а):
> > > > > >
> > > > > > Maxim,
> > > > > >
> > > > > > My congratulations! Well deserved!
> > > > > >
> > > > > > P.S. We should mention snapshots for persistent caches in the
> > > > > > achievement list. Great job!
> > > > > >
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > > >
> > > > > > чт, 7 мая 2020 г. в 14:47, Dmitriy Pavlov :
> > > > > >>
> > > > > >> The Project Management Committee (PMC) for Apache Ignite
> > > > > >>
> > > > > >> has invited Maxim Muzafarov to become new PMC member and we are
> > > > pleased to
> > > > > >> announce that he has accepted.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Maxim is active dev list participant, speaker at meetups, and
> > > > contributes
> > > > > >> to additional checks of the product using travis and started new
> > > file
> > > > > >> rebalance. Maxim did a fantastic job to make release 2.8
> possible
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Being a PMC member enables assistance with the management
> > > > > >>
> > > > > >> and to guide the direction of the project.
> > > > > >>
> > > > > >> Maxim, congrats with new role and keep the pace !
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Best Regards,
> > > > > >>
> > > > > >> Dmitriy Pavlov
> > > > > >>
> > > > > >> on behalf of Apache Ignite PMC
> > > > >
> > > >
> > > >
> > > > --
> > > > Best wishes,
> > > > Amelchev Nikita
> > > >
> > >
> >
> --
> Best Regards,
> Vyacheslav D.
>


Clearing of injected resources on node stop

2019-10-25 Thread Дмитрий Сорокин
Hi folks!

At the moment every resources injected by GridResourceProcessor on node
start, also is clearing  on node stop, but before the stopping of
executors. That behavior on node stopping sometimes causes NPEs at calls
which performed in threads of StripedExecutor, for example, see [1].

Does anybody remember the motivation of cleaning injected resources?

Also, can anybody to tell what is bad things can happen if we won't clean
(nullify) injected logger resources?

[1] https://issues.apache.org/jira/browse/IGNITE-1606

—
Dmitriy Sorokin


Re: TC build queue stall

2019-04-26 Thread Дмитрий Сорокин
Hi Ivan!

Thanks for investigation!
I decided to try second your proposal first, and force pushed the squashed
commits just now. Let’s look whether it helped or not.

пт, 26 апр. 2019 г. в 8:57, Павлухин Иван :

> I digged a little bit deeper into causes of the VCS problem reported
> by TC. It looks like Russian characters in commit details (an author
> name here) drive TC crazy. Still not sure that it is the root cause
> (quite weird if so).
>
> I found a PR with problematic commits [1]. Dmitrii Ryabov, Dmitriy
> Sorokin and Mikhail Petrov commited to that PR. Guys could you please
> close the PR in order to check that it makes TC mad? It seems that
> closing PR makes no harm because it can be easily reopened after. Or,
> for example squash commits and force push to PR avoiding Russian
> characters.
>
> Need your assistance here.
>
> [1] https://github.com/apache/ignite/pull/6223/commits
>
> пт, 26 апр. 2019 г. в 06:10, Павлухин Иван :
> >
> > Hi Igniters,
> >
> > Right now I am observing that many builds cannot make progress for
> > many hours so far. TC shows that the majority of agents is free [1].
> > But new builds seems fail to start.
> >
> > Does anyone know what is it and how to resolve it? Perhaps simple
> > reboot can help here. Also new codestyle check is a recent change and
> > might be somehow related.
> >
> > Also I see problem with git checkout in some logs:
> > [13:45:12] Collecting changes in 1 VCS root (7s)
> > [13:45:12] [Collecting changes in 1 VCS root] VCS Root details
> > [13:45:18] [Collecting changes in 1 VCS root] Compute revision for
> > 'GitHub [apache/ignite]'
> > [14:36:57] The build is removed from the queue to be prepared for the
> start
> > [14:36:57] Starting the build on the agent aitc-lin12:06
> > [14:36:58] Clearing temporary directory: /opt/buildagent/temp/buildTmp
> > [14:36:58] Publishing internal artifacts
> > [14:36:58] Clean build enabled: removing old files from
> > /opt/buildagent/work/69588afcb2ab3382
> > [14:36:58] Checkout directory: /opt/buildagent/work/69588afcb2ab3382
> > [14:36:58] Updating sources: server side checkout (running for
> 15h:17m:37s)
> > [14:36:58] [Updating sources] Building clean patch for VCS root:
> > GitHub [apache/ignite]
> > [14:39:20] [Updating sources] Failed to build patch for build #11255
> > {build id=3698721, buildTypeId=IgniteTests24Java8_CacheFailover2}, VCS
> > root: "GitHub [apache/ignite]" {instance id=296, parent internal
> > id=77, parent id=GitHubApacheIgnite, description:
> > "https://github.com/apache/ignite.git#refs/heads/master"}, due to
> > error: Patch building failed:
> > org.apache.catalina.connector.ClientAbortException:
> > java.net.SocketTimeoutException
> > [14:39:20] [Updating sources] Transferring repository sources: 36.93
> > MB so far...
> > [02:40:02] [Updating sources] Transferring repository sources: 36.95
> > MB so far...
> >
> > And a reported problem in "Overview" tab:
> > Error collecting changes for VCS repository '"GitHub [apache/ignite]"
> > {instance id=296, parent internal id=77, parent id=GitHubApacheIgnite,
> > description: "https://github.com/apache/ignite.git#refs/heads/master"}'
> >
> jetbrains.buildServer.serverSide.db.MySQL.MySqlIncorrectStringValueException:
> > Incorrect string value: '\xD0\xB4\xD0\xBC\xD0\xB8...' for column
> > 'user_name' at row 1 while performing SQL query: SQL DML: insert into
> > vcs_history (modification_id, user_name, description, change_date,
> > vcs_root_id, version, display_version, changes_count, register_date)
> > values (?, ?, ?, ?, ?, ?, ?, ?, ?) | PARAMETERS: 882649, "дмитрий
> > рябов ", "fix warming up throttle test\n",
> > 1551457143000, 296, "06127b70b2060dade9bd652abc71f2b949508245",
> > "06127b70b2060dade9bd652abc71f2b949508245", 1, 1556247859429:
> > java.sql.SQLException: Incorrect string value:
> > '\xD0\xB4\xD0\xBC\xD0\xB8...' for column 'user_name' at row 1
> >
> > [1]
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll=buildTypeBranches
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Prewarming of page memory after node restart

2019-03-04 Thread Дмитрий Сорокин
Hi Igniters!



I want to propose the new feature for Ignite - prewarming of page memory.



The essence of prewarming feature is that after restarting the node is to
load into memory those pages that were loaded before last shutdown. I have
implemented the PoC of prewarming of page memory, which makes dump of IDs
of loaded pages before node shutdown and, optionally, at runtime. When node
restarts, special worker loads the pages with saved IDs in memory, and
restore phase depending on configuration may or may not to wait the end of
prewarming process. Persentage of hottest loaded pages may be configured,
if dumping of all of ones is not need. Prewarming process is multithreaded
(optionally) and has heuristical throttling algorithm.
PrewarmingConfiguration is the only change of public API, and was added as
property of DataRegionConfiguration, which non-null value enables the
prewarming feature.

Link to the Jira ticket: https://issues.apache.org/jira/browse/IGNITE-11457

Link to the PR: https://github.com/apache/ignite/pull/6223.



I would like to receive feedback on this prewarming feature concept.


Internal problems requiring graceful node shutdown, reboot, etc.

2017-11-30 Thread Дмитрий Сорокин
Hi, Igniters!

We have a set of internal problems, which required graceful node shutdown,
or other reaction configured (See discussion thread
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-Enhancement-Proposal-7-Internal-problems-detection-td24460.html
):
- IgniteOutOfMemoryException -
https://issues.apache.org/jira/browse/IGNITE-6892
- Persistence errors - https://issues.apache.org/jira/browse/IGNITE-6891
- ExchangeWorker exits with error -
https://issues.apache.org/jira/browse/IGNITE-6890

First, I propose reconsider 3rd problem as "System worker exit while node
still running (node stopping process has not been started)", because we
have at least 5 worker classes, which running is critical for node working.

These workers are:
- partition-exchanger (ExchangeWorker)
- disco-event-worker
- nio-acceptor
- grid-nio-worker-tcp-comm-*
- grid-timeout-worker

Second, I propose to use FailureProcessingPolicy (already implemented in
scope of task IGNITE-6890) for reaction definition on 1st and 2nd detected
problems too. This policy can be configured similar to SegmentationPolicy
in IgniteConfiguration.

Opinions?


Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-29 Thread Дмитрий Сорокин
Vladimir,

At the moment policy looks like so:

/**
 * Policy that defines how node will process the failures. Note that default
 * failure processing policy is defined by {@link
IgniteConfiguration#DFLT_FLR_PLC} property.
 */
public enum FailureProcessingPolicy {
/** Restart jvm. */
RESTART_JVM,

/** Stop. */
STOP,

/** Noop. */
NOOP;
}

Can You give an example which different event (failure) types need
different reactions?
We expect that all failures when some ignite system worker (or other
critical component) will broken, need same policy for same node.


2017-11-29 13:56 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Dmitry,
>
> Thank you, but how FailureProcessingPolicy looks like? It is not clear how
> can I configure different reactions to different event types.
>
> On Wed, Nov 29, 2017 at 1:47 PM, Дмитрий Сорокин <
> sbt.sorokin@gmail.com>
> wrote:
>
> > Vladimir,
> >
> > These policies (policy, in fact) can be configured in IgniteConfiguration
> > by calling setFailureProcessingPolicy(FailureProcessingPolicy flrPlc)
> > method.
> >
> > 2017-11-29 10:35 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >
> > > Denis,
> > >
> > > Yes, but can we look at proposed API before we dig into implementation?
> > >
> > > On Tue, Nov 28, 2017 at 9:43 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > > > I think the failure processing policy should be configured via
> > > > IgniteConfiguration in a way similar to the segmentation policies.
> > > >
> > > > —
> > > > Denis
> > > >
> > > > > On Nov 27, 2017, at 11:28 PM, Vladimir Ozerov <
> voze...@gridgain.com>
> > > > wrote:
> > > > >
> > > > > Dmitry,
> > > > >
> > > > > How these policies will be configured? Do you have any API in mind?
> > > > >
> > > > > On Thu, Nov 23, 2017 at 6:26 PM, Denis Magda <dma...@apache.org>
> > > wrote:
> > > > >
> > > > >> No objections here. Additional policies like EXEC might be added
> > later
> > > > >> depending on user needs.
> > > > >>
> > > > >> —
> > > > >> Denis
> > > > >>
> > > > >>> On Nov 23, 2017, at 2:26 AM, Дмитрий Сорокин <
> > > > sbt.sorokin@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>> Denis,
> > > > >>> I propose start with first three policies (it's already
> > implemented,
> > > > just
> > > > >>> await some code combing, commit & review).
> > > > >>> About of fourth policy (EXEC) I think that it's rather additional
> > > > >> property
> > > > >>> (some script path) than policy.
> > > > >>>
> > > > >>> 2017-11-23 0:43 GMT+03:00 Denis Magda <dma...@apache.org>:
> > > > >>>
> > > > >>>> Just provide FailureProcessingPolicy with possible reactions:
> > > > >>>> - NOOP - exceptions will be reported, metrics will be triggered
> > but
> > > an
> > > > >>>> affected Ignite process won’t be touched.
> > > > >>>> - HAULT (or STOP or KILL) - all the actions of the of NOOP +
> > Ignite
> > > > >>>> process termination.
> > > > >>>> - RESTART - NOOP actions + process restart.
> > > > >>>> - EXEC - execute a custom script provided by the user.
> > > > >>>>
> > > > >>>> If needed the policy can be set per know failure such is OOM,
> > > > >> Persistence
> > > > >>>> errors so that the user can act accordingly basing on a context.
> > > > >>>>
> > > > >>>> —
> > > > >>>> Denis
> > > > >>>>
> > > > >>>>> On Nov 21, 2017, at 11:43 PM, Vladimir Ozerov <
> > > voze...@gridgain.com>
> > > > >>>> wrote:
> > > > >>>>>
> > > > >>>>> In the first iteration I would focus only on reporting
> > facilities,
> > > to
> > > > >> let
> > > > >>>>> administrator spot dangerous situation. And in the second
> phase,
> > > when
> > > > >

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-29 Thread Дмитрий Сорокин
Vladimir,

These policies (policy, in fact) can be configured in IgniteConfiguration
by calling setFailureProcessingPolicy(FailureProcessingPolicy flrPlc)
method.

2017-11-29 10:35 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Denis,
>
> Yes, but can we look at proposed API before we dig into implementation?
>
> On Tue, Nov 28, 2017 at 9:43 PM, Denis Magda <dma...@apache.org> wrote:
>
> > I think the failure processing policy should be configured via
> > IgniteConfiguration in a way similar to the segmentation policies.
> >
> > —
> > Denis
> >
> > > On Nov 27, 2017, at 11:28 PM, Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> > >
> > > Dmitry,
> > >
> > > How these policies will be configured? Do you have any API in mind?
> > >
> > > On Thu, Nov 23, 2017 at 6:26 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > >> No objections here. Additional policies like EXEC might be added later
> > >> depending on user needs.
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Nov 23, 2017, at 2:26 AM, Дмитрий Сорокин <
> > sbt.sorokin@gmail.com>
> > >> wrote:
> > >>>
> > >>> Denis,
> > >>> I propose start with first three policies (it's already implemented,
> > just
> > >>> await some code combing, commit & review).
> > >>> About of fourth policy (EXEC) I think that it's rather additional
> > >> property
> > >>> (some script path) than policy.
> > >>>
> > >>> 2017-11-23 0:43 GMT+03:00 Denis Magda <dma...@apache.org>:
> > >>>
> > >>>> Just provide FailureProcessingPolicy with possible reactions:
> > >>>> - NOOP - exceptions will be reported, metrics will be triggered but
> an
> > >>>> affected Ignite process won’t be touched.
> > >>>> - HAULT (or STOP or KILL) - all the actions of the of NOOP + Ignite
> > >>>> process termination.
> > >>>> - RESTART - NOOP actions + process restart.
> > >>>> - EXEC - execute a custom script provided by the user.
> > >>>>
> > >>>> If needed the policy can be set per know failure such is OOM,
> > >> Persistence
> > >>>> errors so that the user can act accordingly basing on a context.
> > >>>>
> > >>>> —
> > >>>> Denis
> > >>>>
> > >>>>> On Nov 21, 2017, at 11:43 PM, Vladimir Ozerov <
> voze...@gridgain.com>
> > >>>> wrote:
> > >>>>>
> > >>>>> In the first iteration I would focus only on reporting facilities,
> to
> > >> let
> > >>>>> administrator spot dangerous situation. And in the second phase,
> when
> > >> all
> > >>>>> reporting and metrics are ready, we can think on some automatic
> > >> actions.
> > >>>>>
> > >>>>> On Wed, Nov 22, 2017 at 10:39 AM, Mikhail Cherkasov <
> > >>>> mcherka...@gridgain.com
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Hi Anton,
> > >>>>>>
> > >>>>>> I don't think that we should shutdown node in case of
> > >>>> IgniteOOMException,
> > >>>>>> if one node has no space, then other probably  don't have it too,
> so
> > >> re
> > >>>>>> -balancing will cause IgniteOOM on all other nodes and will kill
> the
> > >>>> whole
> > >>>>>> cluster. I think for some configurations cluster should survive
> and
> > >>>> allow
> > >>>>>> to user clean cache or/and add more nodes.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Mikhail.
> > >>>>>>
> > >>>>>> 20 нояб. 2017 г. 6:53 ПП пользователь "Anton Vinogradov" <
> > >>>>>> avinogra...@gridgain.com> написал:
> > >>>>>>
> > >>>>>>> Igniters,
> > >>>>>>>
> > >>>>>>> Internal problems may and, unfortunately, cause unexpected
> cluster
> > >>>>>>> behavior.
> > >>>>>>> We should determine behavior i

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-23 Thread Дмитрий Сорокин
Denis,
I propose start with first three policies (it's already implemented, just
await some code combing, commit & review).
About of fourth policy (EXEC) I think that it's rather additional property
(some script path) than policy.

2017-11-23 0:43 GMT+03:00 Denis Magda :

> Just provide FailureProcessingPolicy with possible reactions:
> - NOOP - exceptions will be reported, metrics will be triggered but an
> affected Ignite process won’t be touched.
> - HAULT (or STOP or KILL) - all the actions of the of NOOP + Ignite
> process termination.
> - RESTART - NOOP actions + process restart.
> - EXEC - execute a custom script provided by the user.
>
> If needed the policy can be set per know failure such is OOM, Persistence
> errors so that the user can act accordingly basing on a context.
>
> —
> Denis
>
> > On Nov 21, 2017, at 11:43 PM, Vladimir Ozerov 
> wrote:
> >
> > In the first iteration I would focus only on reporting facilities, to let
> > administrator spot dangerous situation. And in the second phase, when all
> > reporting and metrics are ready, we can think on some automatic actions.
> >
> > On Wed, Nov 22, 2017 at 10:39 AM, Mikhail Cherkasov <
> mcherka...@gridgain.com
> >> wrote:
> >
> >> Hi Anton,
> >>
> >> I don't think that we should shutdown node in case of
> IgniteOOMException,
> >> if one node has no space, then other probably  don't have it too, so re
> >> -balancing will cause IgniteOOM on all other nodes and will kill the
> whole
> >> cluster. I think for some configurations cluster should survive and
> allow
> >> to user clean cache or/and add more nodes.
> >>
> >> Thanks,
> >> Mikhail.
> >>
> >> 20 нояб. 2017 г. 6:53 ПП пользователь "Anton Vinogradov" <
> >> avinogra...@gridgain.com> написал:
> >>
> >>> Igniters,
> >>>
> >>> Internal problems may and, unfortunately, cause unexpected cluster
> >>> behavior.
> >>> We should determine behavior in case any of internal problem happened.
> >>>
> >>> Well known internal problems can be split to:
> >>> 1) OOM or any other reason cause node crash
> >>>
> >>> 2) Situations required graceful node shutdown with custom notification
> >>> - IgniteOutOfMemoryException
> >>> - Persistence errors
> >>> - ExchangeWorker exits with error
> >>>
> >>> 3) Prefomance issues should be covered by metrics
> >>> - GC STW duration
> >>> - Timed out tasks and jobs
> >>> - TX deadlock
> >>> - Hanged Tx (waits for some service)
> >>> - Java Deadlocks
> >>>
> >>> I created special issue [1] to make sure all these metrics will be
> >>> presented at WebConsole or VisorConsole (what's preferred?)
> >>>
> >>> 4) Situations required external monitoring implementation
> >>> - GC STW duration exceed maximum possible length (node should be
> stopped
> >>> before STW finished)
> >>>
> >>> All this problems were reported by different persons different time
> ago,
> >>> So, we should reanalyze each of them and, possible, find better ways to
> >>> solve them than it described at issues.
> >>>
> >>> P.s. IEP-7 [2] already contains 9 issues, feel free to mention
> something
> >>> else :)
> >>>
> >>> [1] https://issues.apache.org/jira/browse/IGNITE-6961
> >>> [2]
> >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> >>> 7%3A+Ignite+internal+problems+detection
> >>>
> >>
>
>


Re: Facility to detect long STW pauses and other system response degradations

2017-11-21 Thread Дмитрий Сорокин
Don't forget that the high utilization of CPU can occur for reasons other
than GC STW, and GC log parsing will not help us in that case.


вт, 21 нояб. 2017 г. в 13:06, Anton Vinogradov [via Apache Ignite
Developers] :

> Denis,
>
> > 1. Totally for a separate native process that will handle the monitoring
> of an Ignite process. The watchdog process can simply start a JVM tool
> like
> jstat and parse its GC logs: https://dzone.com/articles/
> how-monitor-java-garbage  how-monitor-java-garbage>
> Different GC and even same GC at different OS/JVM produce different logs.
> That's not easy to parse them. But, since http://gceasy.io can do that,
> it
> looks to be possible, somehow :) .
> Do you know any libs or solutions allows to do this at realtime?
>
> > 2. As for the STW handling, I would make a possible reaction more
> generic. Let’s define a policy (enumeration) that will define how to deal
> with an unstable node. The events might be as follows - kill a node,
> restart a node, trigger a custom script using Runtime.exec or other
> methods.
> Yes, it should be similar to segmentation policy + custom script
> execution.
>
>
> On Tue, Nov 21, 2017 at 2:10 AM, Denis Magda <[hidden email]
> > wrote:
>
> > My 2 cents.
> >
> > 1. Totally for a separate native process that will handle the monitoring
> > of an Ignite process. The watchdog process can simply start a JVM tool
> like
> > jstat and parse its GC logs: https://dzone.com/articles/
> > how-monitor-java-garbage  > how-monitor-java-garbage>
> >
> > 2. As for the STW handling, I would make a possible reaction more
> generic.
> > Let’s define a policy (enumeration) that will define how to deal with an
> > unstable node. The events might be as follows - kill a node, restart a
> > node, trigger a custom script using Runtime.exec or other methods.
> >
> > What’d you think? Specifically on point 2.
> >
> > —
> > Denis
> >
> > > On Nov 20, 2017, at 6:47 AM, Anton Vinogradov <[hidden email]
> >
> > wrote:
> > >
> > > Yakov,
> > >
> > > Issue is https://issues.apache.org/jira/browse/IGNITE-6171
> > >
> > > We split issue to
> > > #1 STW duration metrics
> > > #2 External monitoring allows to stop node during STW
> > >
> > >> Testing GC pause with java thread is
> > >> a bit strange and can give info only after GC pause finishes.
> > >
> > > That's ok since it's #1
> > >
> > > On Mon, Nov 20, 2017 at 5:45 PM, Dmitriy_Sorokin <
> > [hidden email] >
> > > wrote:
> > >
> > >> I have tested solution with java-thread and GC logs had contain same
> > pause
> > >> values of thread stopping which was detected by java-thread.
> > >>
> > >>
> > >> My log (contains pauses > 100ms):
> > >> [2017-11-20 17:33:28,822][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 507 milliseconds.
> > >> [2017-11-20 17:33:34,522][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 5595 milliseconds.
> > >> [2017-11-20 17:33:37,896][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 3262 milliseconds.
> > >> [2017-11-20 17:33:39,714][WARN ][Thread-1][root] Possible too long
> STW
> > >> pause: 1737 milliseconds.
> > >>
> > >> GC log:
> > >> gridgain@dell-5580-92zc8h2:~$ cat
> > >> ./dev/ignite-logs/gc-2017-11-20_17-33-27.log | grep Total
> > >> 2017-11-20T17:33:27.608+0300: 0,116: Total time for which application
> > >> threads were stopped: 0,845 seconds, Stopping threads took:
> > 0,246
> > >> seconds
> > >> 2017-11-20T17:33:27.667+0300: 0,175: Total time for which application
> > >> threads were stopped: 0,0001072 seconds, Stopping threads took:
> > 0,252
> > >> seconds
> > >> 2017-11-20T17:33:28.822+0300: 1,330: Total time for which application
> > >> threads were stopped: 0,5001082 seconds, Stopping threads took:
> > 0,178
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:34.521+0300: 7,030: Total time for which application
> > >> threads were stopped: 5,5856603 seconds, Stopping threads took:
> > 0,229
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:37.896+0300: 10,405: Total time for which
> application
> > >> threads were stopped: 3,2595700 seconds, Stopping threads took:
> > 0,223
> > >> seconds// GOT!
> > >> 2017-11-20T17:33:39.714+0300: 12,222: Total time for which
> application
> > >> threads were stopped: 1,7337123 seconds, Stopping threads took:
> > 0,121
> > >> seconds// GOT!
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >>
> >
> >
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Facility-to-detect-long-STW-pauses-and-other-system-response-degradations-tp24391p24497.html
> To unsubscribe from Facility to detect long STW