S3 discovery and docker bridge networks

2018-12-21 Thread David Harvey
The general problem is a when a node pushes its IP addresses to S3, it has
no way to tell whether one of its IP address will be usable by other
nodes.   On the consumer side, we are unable to determine at runtime which
of the addresses will work.   So we end up doing discovery using worthless
IP addresses, and then need to suffer timeouts to sort this out, slowing
down startup.

Our fix is to allow configuration of a set of exclusion patterns (regex) on
TCPCommunicationSpi, so we could exclude "192.168.*" for example.

https://jira.apache.org/jira/browse/IGNITE-10791


[jira] [Created] (IGNITE-10791) Avoid unusable network during discovery

2018-12-21 Thread David Harvey (JIRA)
David Harvey created IGNITE-10791:
-

 Summary: Avoid unusable  network  during discovery
 Key: IGNITE-10791
 URL: https://issues.apache.org/jira/browse/IGNITE-10791
 Project: Ignite
  Issue Type: Improvement
Reporter: David Harvey


Problem:  In some deployments, there are multiple IP addresses,  and  S3 
discovery tries them in some random order, and times out on the ones that don't 
work, slowing down discovery unnecessarily.   In many such cases, the set of 
unusable address is known by humans, but is not discoverable at runtime.   For 
example, some IP addresses may be blocked by firewalls.   On ECS, the docker 
bridge network IPs are visible, but are unusable across nodes.

 

http://apache-ignite-users.70518.x6.nabble.com/Avoiding-Docker-Bridge-network-when-using-S3-discovery-td24778.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: TcpCommunicationSpi extension to ignore docker bridge network

2018-11-20 Thread David Harvey
What we prototyped was configuring via spring the list of IPs to ignore,
because a given installation seemed to have a constant address for the
bridge network, and this approach was reliable, once you know the bridge
IPs.   It is also a more general solution.

When the container starts, you get a list of IP addresses from the kernel.
 At that point it is impossible to know from inside the container which of
those addresses can be used by other ignite nodes, at least without
external information.   For exampe, ifI have ignite running on an AWS
instance that has an internal and external address, it is impossible to
know which address will be able to reach the other nodes, unless you are
told.   So perhaps we should have used a list of ranges rather than a list
in our prototype.

For the docker sub-case where all the nodes seem to get the same useless
address, I would think we can ignore IP address/port pairs that are the
current node is also advertising.That does not generalize to other
cases were the kernel provides unusable addresses.I didn't quite
understand why if we try to connect to port we are advertising, this would
need to timeout, rather than getting immediately rejected, unless Ignite
has explicit code to do detected and ignore a self message.   But if there
is a IP:port pair that the current node is claiming as an endpoint, it
should not try to use that IP:port to connect to other nodes.

On Tue, Nov 20, 2018 at 2:27 PM David Harvey  wrote:

> What we prototyped was configuring via spring the list of IPs to ignore,
> because a given installation seemed to have a constant address for the
> bridge network, and this approach was reliable, once you know the bridge
> IPs.
>
> When the container starts, you get a list of IP addresses from the
> kernel.   At that point it is impossible to know from inside the container
> which of those addresses can be used by other ignite nodes, at least
> without external information.   Similarly, if I have an AWS instance
>
> I am wondering
>
>
>
> On Tue, Nov 20, 2018 at 2:08 PM Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
>> Hi David,
>>
>> This is something we have also encountered recently and I was wondering
>> how
>> this can be mitigated in a general case. Do you know if an application can
>> detect that it is being run in a docker container and add the
>> corresponding
>> list of bridge IPs automatically on start? If so, I this we can add this
>> to
>> the Ignite so that it works out of the box.
>>
>> --AG
>>
>>
>> вт, 20 нояб. 2018 г. в 19:58, David Harvey :
>>
>> > We see some annoying behavior with S3 discovery because Ignite will
>> push to
>> > the discovery S3 bucket the IP address of the local docker bridge
>> network
>> > (172.17.0.1) in our case.   Basically, each node when coming online
>> tries
>> > that address first, and has to go through a network timeout to recover.
>> >
>> > To address this, have prototyped a simple extension to
>> TcpCommunicationSpi
>> > to allow configuration of a list of IP addresses that should be
>> completely
>> > ignored, and will create a ticket and generate a pull request for it.
>> >
>> > If there is a better approach, please let us know.
>> >
>> > Thanks
>> > Dave Harvey
>> >
>>
>


Re: TcpCommunicationSpi extension to ignore docker bridge network

2018-11-20 Thread David Harvey
What we prototyped was configuring via spring the list of IPs to ignore,
because a given installation seemed to have a constant address for the
bridge network, and this approach was reliable, once you know the bridge
IPs.

When the container starts, you get a list of IP addresses from the kernel.
 At that point it is impossible to know from inside the container which of
those addresses can be used by other ignite nodes, at least without
external information.   Similarly, if I have an AWS instance

I am wondering



On Tue, Nov 20, 2018 at 2:08 PM Alexey Goncharuk 
wrote:

> Hi David,
>
> This is something we have also encountered recently and I was wondering how
> this can be mitigated in a general case. Do you know if an application can
> detect that it is being run in a docker container and add the corresponding
> list of bridge IPs automatically on start? If so, I this we can add this to
> the Ignite so that it works out of the box.
>
> --AG
>
>
> вт, 20 нояб. 2018 г. в 19:58, David Harvey :
>
> > We see some annoying behavior with S3 discovery because Ignite will push
> to
> > the discovery S3 bucket the IP address of the local docker bridge network
> > (172.17.0.1) in our case.   Basically, each node when coming online tries
> > that address first, and has to go through a network timeout to recover.
> >
> > To address this, have prototyped a simple extension to
> TcpCommunicationSpi
> > to allow configuration of a list of IP addresses that should be
> completely
> > ignored, and will create a ticket and generate a pull request for it.
> >
> > If there is a better approach, please let us know.
> >
> > Thanks
> > Dave Harvey
> >
>


TcpCommunicationSpi extension to ignore docker bridge network

2018-11-20 Thread David Harvey
We see some annoying behavior with S3 discovery because Ignite will push to
the discovery S3 bucket the IP address of the local docker bridge network
(172.17.0.1) in our case.   Basically, each node when coming online tries
that address first, and has to go through a network timeout to recover.

To address this, have prototyped a simple extension to TcpCommunicationSpi
to allow configuration of a list of IP addresses that should be completely
ignored, and will create a ticket and generate a pull request for it.

If there is a better approach, please let us know.

Thanks
Dave Harvey


[jira] [Created] (IGNITE-10135) Documentation link to ClusterNodeAttributeAffinityBackupFilter

2018-11-02 Thread David Harvey (JIRA)
David Harvey created IGNITE-10135:
-

 Summary: Documentation link to 
ClusterNodeAttributeAffinityBackupFilter
 Key: IGNITE-10135
 URL: https://issues.apache.org/jira/browse/IGNITE-10135
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Reporter: David Harvey
 Fix For: 2.7


IGNITE-9365 adds ClusterNodeAttributeAffinityBackupFilter to allow "Crash-safe 
Affinity"  
([https://apacheignite.readme.io/docs/affinity-collocation|https://apacheignite.readme.io/docs/affinity-collocation_])
 to be configured from spring.  

The class should have an adequate description of how to set this up, but the 
above section should link to or otherwise flag that such a procedure exist.

[https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.java]

 

Note: the implementation is generic, allowing any node attribute (or 
environment variable) to be configured so that primary and backup's are always 
forced to not be on the same node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: IGNITE-2.7. New Features

2018-11-02 Thread David Harvey
IGNITE-9365 Force backups to different AWS availability zones using only
Spring XML

This deserves documentation enhancements:
https://apacheignite.readme.io/docs/affinity-collocation

What is the mechanism for proposing an update for 2.7?


On Fri, Nov 2, 2018 at 6:31 AM Andrey Kuznetsov  wrote:

> Great news!
>
> Future release is about to contain mission critical Ignite workers liveness
> monitoring, introduced in IGNITE-6587.
>
>
> пт, 2 нояб. 2018 г. в 13:23, Nikolay Izhikov :
>
> > Hello, Guys.
> >
> > Good news! We have 2 final tickets for 2.7.
> > So release date is very near!
> >
> > Let's collect new features and improvements of Ignite 2.7 and includes it
> > to release notes and other documents.
> >
> > Can you answer and describe your contributions?
> >
>
>
> --
> Best regards,
>   Andrey Kuznetsov.
>


Re: Pre-touch for Ignite off-heap memory

2018-10-24 Thread David Harvey
Denis,

We run must of our production DBs systems without any swapping space,
because the 10-100x drop in throughput if such systems start paging makes
them worse than useless.  However, we don't get OOM on them until all the
pages are dirty, since LINUX will page out read-only (code) pages or memory
mapped files.

Dave Harvey



On Wed, Oct 24, 2018 at 12:12 AM Denis Magda  wrote:

> Alex,
>
> Correct me if I'm wrong, but even if an OS runs out of physical memory (X
> GB in total) an Ignite node process still can request the X GB from virtual
> memory. Yes, virtual memory can involve swapping and disk to satisfy your
> request but this shouldn't be a reason of the OOM. Shouldn't OOM happen if
> you're trying to allocate beyond the virtual memory capacity (beyond X GB)?
>
> Denis
>
> On Tue, Oct 23, 2018 at 12:08 PM Gerus  wrote:
>
> > Hi *Igniters*,
> > Some time ago I've raised a suggestion for product improvement
> > https://issues.apache.org/jira/browse/IGNITE-9112
> >   .  It's all about
> > off-heap memory allocation. Current implementation can have some
> > improvements for failure critical systems. Ignite can have OOM in
> runtime,
> > because RAM can be used by OS, if it will not be pre-booked by operation
> > system and this proposal is to address that. Common case is offheap and
> > thats why memory segment cannot be managed by JVM that has
> +AlwaysPreTouch
> > option
> > Obviously this implementation will make startup longer and thats why it
> is
> > proposed to use configuration flag to manage this feature
> > I think, it will be useful to have this option. Are you supporting this?
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


Re: Applicability of term 'cache' to Apache Ignite

2018-10-18 Thread David Harvey
We had a terminology agreement early on where we agreed to call them
caches, but we still call them tables anyway.

When I finally understood how you could have multiple tables in a single
cache,  I tried to find example use cases, but couldn't.  Is there even a
test with multiple queryEntities?

On Thu, Oct 18, 2018, 8:10 AM Alexey Zinoviev 
wrote:

> From my perspective (ML module), it will be very easy to talk about Ignite
> in SQL terms like table (with additional information about ability to make
> key-value CRUD operations, not only SELECT * FROM Table)
> Also we could look on PostgreSQL with different plugins for SQL extension
> like PostGIS or support of JSON-B and ability to store not only planar data
> with strict schema (I agrre here with Vladimir).
>
> чт, 18 окт. 2018 г. в 14:33, Ilya Lantukh :
>
> > I thought that current "caches" and "tables" have 1-to-N relation. If
> > that's not a problem, than I also think that "table" is the best term.
> >
> > On Thu, Oct 18, 2018 at 9:29 AM Vladimir Ozerov 
> > wrote:
> >
> > > Well, I never thought about term “table” as a replacement for “cache”,
> > but
> > > it appears to be good candidate.
> > >
> > > This is used by many some major vendors whose underlying storage is
> > indeed
> > > a kind of key-value data structure. Most well-known example is MySQL
> with
> > > its MyISAM engine. Table can be used for both fixed and flexible (e.g.
> > > JSON) schemas, as well as key-value access (hash map -> hash table,
> both
> > > are good).
> > >
> > > Another important thing - we already use term “table”, and it is always
> > > hard to explain our users how it relates to “cache”. If “cache” is
> > dropped,
> > > then a single term “table” will be used everywhere.
> > >
> > > Last, but not least - “table” works well for both in-memory and
> > persistent
> > > modes.
> > >
> > > So if we are really aim to rename “cache”, then “table” is the best
> > > candidate I’ve heard so far.
> > >
> > > чт, 18 окт. 2018 г. в 8:40, Alexey Zinoviev :
> > >
> > > > Or we could extend our SQL commands by "GET BY KEY = X" and "PUT (x1,
> > x2,
> > > > x3) BY KEY = X" and the IgniteTable could be correct.
> > > > Agree with Denis that each table in the 3rd normal form is like
> > key-value
> > > > store. Key-value operations are only subset of rich SQL commands.
> > > >
> > > > The problem with IgniteData that it's too common. Also, it's
> difficult
> > to
> > > > understand is it a plural or single object? For instance, the bunch
> of
> > > > IgniteTables could be IgniteData. But the set of IgniteData?
> > IgniteDatum?
> > > >
> > > >
> > > >
> > > > чт, 18 окт. 2018 г. в 4:18, Denis Magda :
> > > >
> > > > > Key-value calls are just primary key based calls. From a user
> > > > perspective,
> > > > > it's the same as "SELECT * FROM table WHERE primary_idx = X", just
> > > > > different API.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Wed, Oct 17, 2018 at 5:04 PM Dmitriy Setrakyan <
> > > dsetrak...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > On Wed, Oct 17, 2018 at 4:58 PM Denis Magda 
> > > wrote:
> > > > > >
> > > > > > > I've been calling everything "tables" instead of "caches" for a
> > > > while.
> > > > > > The
> > > > > > > main reason is the maturity of our SQL engine - seeing more SQL
> > > users
> > > > > and
> > > > > > > deployments which talk "tables" language.
> > > > > > >
> > > > > > >
> > > > > > I think "IgniteTable" only implies SQL, not key-value. We need
> > both.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Ilya
> >
>


Need Review IGNITE-7616 Mxbeans thread display.

2018-09-28 Thread David Harvey
   1.
  This is my second newbie submission, it could use a review, and I
  keep getting snapshot dependency errors in teamcity that seem like the
  cannot be related to my changes, even after rebasing 2 twice.
I couldn't
  find a crisp defintion of what a snapshot dependency is.

  It is a low regression risk fix to the mxbeans display of thread
  pools, where one type of pool mxbeans registration was simply
miscoded, and
  another type of pool was completely missing any code to display it.  It
  provides insight at runtime into two thread pools that are currently
  opaque.It makes one very modest change to actual production code.

  2. IGNITE-7616
GridDataStreamExecutor
   and GridCallbackExecutor JMX beans return incorrect values due to invalid
   interface registration.


https://github.com/apache/ignite/pull/4732
-DH

PS I have seen this movie before, and the future got much brighter future
when we completed the equivalent of making teamcity  whole again.


Re: affinityBackupFilter for AWS Availability Zones

2018-09-24 Thread David Harvey
Yes, thanks Val!

On Mon, Sep 24, 2018 at 11:35 AM Dmitriy Pavlov 
wrote:

> Hi Val, many thanks for the review.
>
> ср, 12 сент. 2018 г. в 20:35, Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
> > Yes, will try to review this week.
> >
> > -Val
> >
> > On Wed, Sep 12, 2018 at 10:24 AM Dmitriy Pavlov 
> > wrote:
> >
> > > Hi Val,
> > >
> > > I'm not an expert in AWS, so could you please pick up the review?
> > >
> > > Thank you in advance!
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > вт, 11 сент. 2018 г. в 1:28, Dave Harvey :
> > >
> > > > Submitted a patch for this
> > > > https://issues.apache.org/jira/browse/IGNITE-9365
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > >
> >
>


First pull requests

2018-09-17 Thread David Harvey
I'm new to this process, and I've created three pull requests, and I'm
trying to figure out how I can get some eyes to look at them.

   - IGNITE-7616  which
   add some missing MXbeans for thread pools.  I've identified some
   contributors that worked on the files before, but when I type [ ~ name ] in
   Jira using parts of names found in git or here
   
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#HowtoContribute-ReviewProcessandMaintainers,
   Jira generally doesn't find those people.
   - IGNITE-9026  which
   fixes some bugs with CONTINUOUS peer class loading when such a loaded class
   makes a call to another node.   These addressed the problems we had, but I
   still have to to write a couple of new tests.   However, it would useful to
   get feedback on the approach, because it fundamentally changes the search
   process.
   - IGNITE-9365 which adds an optional affinityBackupFilter that can be
   configured in Spring  to separate primary and backup partitions based on
   some node attribute/environment variable, with an example of how to use
   this to force primary and backup to different AWS Availability Zones.  Val
   said he would look at this one.


Thanks,
Dave Harvey


Re: Critical worker threads liveness checking drawbacks

2018-09-10 Thread David Harvey
When I've done this before,I've needed to find the oldest  thread, and kill
the node running that.   From a language standpoint, Maxim's "without
progress" better than "heartbeat".   For example, what I'm most interested
in on a distributed system is which thread started the work it has not
completed the earliest, and when did that thread last make forward
process. You don't want to kill a node because a thread is waiting on a
lock held by a thread that went off-node and has not gotten a response.
If you don't understand the dependency relationships, you will make
incorrect recovery decisions.

On Mon, Sep 10, 2018 at 4:08 AM Maxim Muzafarov  wrote:

> I think we should find exact answers to these questions:
>  1. What `critical` issue exactly is?
>  2. How can we find critical issues?
>  3. How can we handle critical issues?
>
> First,
>  - Ignore uninterruptable actions (e.g. worker\service shutdown)
>  - Long I/O operations (should be a configurable timeout for each type of
> usage)
>  - Infinite loops
>  - Stalled\deadlocked threads (and\or too many parked threads, exclude I/O)
>
> Second,
>  - The working queue is without progress (e.g. disco, exchange queues)
>  - Work hasn't been completed since the last heartbeat (checking
> milestones)
>  - Too many system resources used by a thread for the long period of time
> (allocated memory, CPU)
>  - Timing fields associated with each thread status exceeded a maximum time
> limit.
>
> Third (not too many options here),
>  - `log everything` should be the default behaviour in all these cases,
> since it may be difficult to find the cause after the restart.
>  - Wait some interval of time and kill the hanging node (cluster should be
> configured stable enough)
>
> Questions,
>  - Not sure, but can workers miss their heartbeat deadlines if CPU loads up
> to 80%-90%? Bursts of momentary overloads can be
> expected behaviour as a normal part of system operations.
>  - Why do we decide that critical thread should monitor each other? For
> instance, if all the tasks were blocked and unable to run,
> node reset would never occur. As for me, a better solution is to use a
> separate monitor thread or pool (maybe both with software
> and hardware checks) that not only checks heartbeats but monitors the
> other system as well.
>
> On Mon, 10 Sep 2018 at 00:07 David Harvey  wrote:
>
> > It would be safer to restart the entire cluster than to remove the last
> > node for a cache that should be redundant.
> >
> > On Sun, Sep 9, 2018, 4:00 PM Andrey Gura  wrote:
> >
> > > Hi,
> > >
> > > I agree with Yakov that we can provide some option that manage worker
> > > liveness checker behavior in case of observing that some worker is
> > > blocked too long.
> > > At least it will  some workaround for cases when node fails is too
> > > annoying.
> > >
> > > Backups count threshold sounds good but I don't understand how it will
> > > help in case of cluster hanging.
> > >
> > > The simplest solution here is alert in cases of blocking of some
> > > critical worker (we can improve WorkersRegistry for this purpose and
> > > expose list of blocked workers) and optionally call system configured
> > > failure processor. BTW, failure processor can be extended in order to
> > > perform any checks (e.g. backup count) and decide whether it should
> > > stop node or not.
> > > On Sat, Sep 8, 2018 at 3:42 PM Andrey Kuznetsov 
> > wrote:
> > > >
> > > > David, Yakov, I understand your fears. But liveness checks deal with
> > > > _critical_ conditions, i.e. when such a condition is met we conclude
> > the
> > > > node as totally broken, and there is no sense to keep it alive
> > regardless
> > > > the data it contains. If we want to give it a chance, then the
> > condition
> > > > (long fsync etc.) should not considered as critical at all.
> > > >
> > > > сб, 8 сент. 2018 г. в 15:18, Yakov Zhdanov :
> > > >
> > > > > Agree with David. We need to have an opporunity set backups count
> > > threshold
> > > > > (at runtime also!) that will not allow any automatic stop if there
> > > will be
> > > > > a data loss. Andrey, what do you think?
> > > > >
> > > > > --Yakov
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > >
> >
> --
> --
> Maxim Muzafarov
>


Re: Critical worker threads liveness checking drawbacks

2018-09-09 Thread David Harvey
It would be safer to restart the entire cluster than to remove the last
node for a cache that should be redundant.

On Sun, Sep 9, 2018, 4:00 PM Andrey Gura  wrote:

> Hi,
>
> I agree with Yakov that we can provide some option that manage worker
> liveness checker behavior in case of observing that some worker is
> blocked too long.
> At least it will  some workaround for cases when node fails is too
> annoying.
>
> Backups count threshold sounds good but I don't understand how it will
> help in case of cluster hanging.
>
> The simplest solution here is alert in cases of blocking of some
> critical worker (we can improve WorkersRegistry for this purpose and
> expose list of blocked workers) and optionally call system configured
> failure processor. BTW, failure processor can be extended in order to
> perform any checks (e.g. backup count) and decide whether it should
> stop node or not.
> On Sat, Sep 8, 2018 at 3:42 PM Andrey Kuznetsov  wrote:
> >
> > David, Yakov, I understand your fears. But liveness checks deal with
> > _critical_ conditions, i.e. when such a condition is met we conclude the
> > node as totally broken, and there is no sense to keep it alive regardless
> > the data it contains. If we want to give it a chance, then the condition
> > (long fsync etc.) should not considered as critical at all.
> >
> > сб, 8 сент. 2018 г. в 15:18, Yakov Zhdanov :
> >
> > > Agree with David. We need to have an opporunity set backups count
> threshold
> > > (at runtime also!) that will not allow any automatic stop if there
> will be
> > > a data loss. Andrey, what do you think?
> > >
> > > --Yakov
> > >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>


Re: Critical worker threads liveness checking drawbacks

2018-09-07 Thread David Harvey
There are at least two production cases that need to be distinguished:
The first is where a single node restart will repair the problem( and you
get the right node.  )
The other cases are those where stopping the node will invalidate it's
backups, leaving only one copy of the data, and the problem is not
resolved.  Lots of opportunities to destroy all copies.  Automated
decisions should take into account whether a node in question is the last
source of Truth.

Killing off a single bad actor using automation is safer than having humans
with the CEO screaming at them to try.
-DH


PS:  I'm just finalizing an extension which allows cache templates created
in spring to force primary and backups to different failure
domains(availability zones) ( no need for custom Java code), and have been
fretting over all the ways to lose data.

On Thu, Sep 6, 2018, 10:03 AM Andrey Kuznetsov  wrote:

> Igniters,
>
> Currently, we have a nearly completed implementation for system-critical
> threads liveness checking [1], in terms of IEP-14 [2] and IEP-5 [3]. In a
> nutshell, system-critical threads monitor each other and checks for two
> aspects:
> - whether a thread is alive;
> - whether a thread is active, i.e. it updates its heartbeat timestamp
> periodically.
> When either check fails, critical failure handler is called, this in fact
> means node stop.
>
> The implementation of activity checks has a flaw now: some blocking actions
> are parts of normal operation and should not lead to node stop, e.g.
> - WAL writer thread can call {{fsync()}};
> - any cache write that occurs in system striped executor can lead to
> {{fsync()}} call again.
> The former example can be fixed by disabling heartbeat checks temporarily
> for known long-running actions, but it won't work with for the latter one.
>
> I see a few options to address the issue:
> - Just log any long-running action instead of calling critical failure
> handler.
> - Introduce several severity levels for long-running actions handling. Each
> level will have its own failure handler. Depending on the level,
> long-running action can lead to node stop, error logging or no-op reaction.
>
> I encourage you to suggest other options. Any idea is appreciated.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6587
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling
> [3]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74683878
>
> --
> Best regards,
>   Andrey Kuznetsov.
>


Minor version changes and server/client compatibility

2018-09-05 Thread David Harvey
We have needed to do a couple of  simple bug fixes to ignite proper, where
there is no change to interfaces or internode communications.   When we do
this, we end up with these choices:

   - Coordinate client and server code bases so that they are in lock
   step.   Tedious with multiple clusters and test/dev versions.
   - Force the prior version number on the new builds, making it more
   tedious to understand what versions we are running.

A standard practice would to ignore the last field in the version when
doing a compatibility test, e.g., 2.5.0 and 2.5.foobar would be considered
compatible.Is there some reason ignite requires and exact match?
 How do other Ignite users handle this problem?

Thanks,

-DH


Re: GridClosureProcessor.affinityRun() semantics

2018-08-31 Thread David Harvey
Val,

A fundamental  problem with sending computes to data is that the computes
can be buggy.   We have a set of constantly changing read-only
analytics we want to run and caches where  readFromBackup is set.

If a bug is introduced where the closure goes into an infinite loop, or
consumes all the memory on all nodes it is sent to, we would like that to
be constrained so that if we lose all those nodes, we still have a full
copy of the data on the remaining. We have implemented the simple
affinityBackupFunction which forces the primary and backup to different
sets of nodes (availability zones) on a partition basis, so that no
partition has all of its replicas in the same group of nodes.   If I
use IgniteCompute.Broadcast(), then the execution will be localized to the
cluster group.

However, if I have use cases that want to sent many closures to execute
near a local copy of the data, I'd like to constraint them in the same
way. I can use the Affinity interface to determine the node that
currently has the key, and send a closure there, but the semantics of
affinityRun.. are what I really would like."The data of the partition
where affKey is stored will not be migrated from the target node while the
job is executed."

The semantics of this case should be clear, and they are not.   The code
should be changed to alter the behavior if the primary node is not in the
subgrid.  It should not depend on lower layers detecting and handling a
case it should not have been allowed through.  If the primary and backups
are not in the subgrid, then throw an  exception. I would not consider
the case of code that depends on the current behavior important.

The interesting question is about cost/benefit of defining this as "use a
backup in the subgrid if the the primary is not in the subgrid". The
primary question on cost is if we did choose a backup node because the
primary was not in the grid, would it just work, or would there be ripple
effects at lower layers?

My hope is that we can do a change of modest benefit with low cost, which
could end up being to change the documentation to say not to do this

Thanks,
-DH





On Fri, Aug 31, 2018 at 2:15 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Dave,
>
> In case it's executed even if primary node is outside of the cluster group,
> then I think it's a bug - I would throw an exception in this case. However,
> is there any particular reason you're doing this? Is there a use case? I
> don't see much sense in combining affinityRun with a cluster group.
>
> Backup no node is never used by affinityRun to my knowledge.
>
> -Val
>
> On Fri, Aug 31, 2018 at 7:07 AM David Harvey  wrote:
>
> > This function takes:
> >
> > int partId,
> >
> > ...
> >
> > @Nullable Collection nodes,
> >
> >
> > It uses partId to find the node with the primary partition, and proceeds
> > even if that node is not in the subgrid that was passed in.  This is
> either
> > a bug, or the semantics should be specified more clearly.
> >
> >
> > There are two sub-cases.
> >
> >
> >- one of nodes in the sub-grid is a backup for the partition
> >- the partition does not exist on any of the nodes in the sub-grid
> >
> > This case can be  exposed via IgnuteCompute.affinityRun... when the
> > IgniteCompute was created with a subgrid that did not include the primary
> > node.
> >
> > I got lost tracing the code below this, and could not tell if this would
> > throw an exception or execute on the primary node.   The later would seem
> > to just be a bug.  It would be simple to change this code to choose a
> node
> > in the subgrid or throw and exception.
> >
> > If it selected a backup node, then would the this part of the
> IgniteCompute
> > contract still hold w/o other changes: "The data of the partition where
> > affKey is stored will not be migrated from the target node while the job
> is
> > executed."  ?
> >
> > In any case, the IgniteCompute semantics around this case should be
> stated.
> >
> >
> > -DH
> >
>


GridClosureProcessor.affinityRun() semantics

2018-08-31 Thread David Harvey
This function takes:

int partId,

...

@Nullable Collection nodes,


It uses partId to find the node with the primary partition, and proceeds
even if that node is not in the subgrid that was passed in.  This is either
a bug, or the semantics should be specified more clearly.


There are two sub-cases.


   - one of nodes in the sub-grid is a backup for the partition
   - the partition does not exist on any of the nodes in the sub-grid

This case can be  exposed via IgnuteCompute.affinityRun... when the
IgniteCompute was created with a subgrid that did not include the primary
node.

I got lost tracing the code below this, and could not tell if this would
throw an exception or execute on the primary node.   The later would seem
to just be a bug.  It would be simple to change this code to choose a node
in the subgrid or throw and exception.

If it selected a backup node, then would the this part of the IgniteCompute
contract still hold w/o other changes: "The data of the partition where
affKey is stored will not be migrated from the target node while the job is
executed."  ?

In any case, the IgniteCompute semantics around this case should be stated.


-DH


Re: affinityBackupFilter for AWS Availability Zones

2018-08-23 Thread David Harvey
Added IGNITE-9365

On Thu, Aug 23, 2018 at 3:56 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi David,
>
> With the Docker image you can actually use additional libraries by
> providing URLs to JARs via EXTERNAL_LIBS property. Please refer to this
> page: https://apacheignite.readme.io/docs/docker-deployment
>
> But anyway, I believe that such contribution might be very valuable for
> Ignite. Feel free to create a ticket.
>
> -Val
>
> On Thu, Aug 23, 2018 at 11:58 AM David Harvey 
> wrote:
>
> > I need an affinityBackupFilter that will prevent backups from running in
> > the same AWS availability zone.  (A single availability zone has the
> > characteristic that some or all of the EC2 instances in that zone can
> fail
> > together due to a single fault.   You have no control over the hosts on
> > which the EC2 instance VMs run on in AWS, except by controlling the
> > availability zone) .
> >
> > I could write a few lines of custom code, but then I have to get it
> > deployed on all nodes in the cluster, and peer class loading will not
> > work.   So I cannot use an of the shelf docker image, for example.   So
> > that code should just be part of Ignite.
> >
> > I was thinking of adding new class along these lines, where the apply
> > function will return true only if none of the node's attributes match
> those
> > of any of the nodes in the list.   This would become part of the code
> base,
> > but would only be used if configured as the backupAffinityFunction
> >
> > ClusterNodeNoAttributesMatchBiPredicate implements
> > IgniteBiPredicate >
> > List> {
> >
> >
> > ClusterNodeNoAttributesMatchBiPredicate(String[] attributeNames)
> > {}
> >
> > For AvailabilityZones, there would be only one attribute examined, but we
> > have some potential use cases for distributing backups across two
> > sub-groups of an AZ.
> >
> > Alternately, we could enhance the RendezvousAffinityFunction to allow one
> > or more arbitrary attributes to be compared  to determine neighbors,
> > rather  than only org.apache.ignite.macs, and to add a setting that
> > controls whether backups should be placed on neighbors if they can't be
> > placed anywhere else.
> >
> > If I have 2 backups and three availability zones (AZ), I want one copy of
> > the data in each AZ.  If all nodes in one AZ fail, I want to be able to
> > decide to try to get to three copies anyway, increasing the per node
> > footprint by 50%, or to only run with one backup. This would also
> give
> > be a convoluted way to change  the number of backups of a cache
> > dynamically:Start the cache with a large number of backups, but don't
> > provide a location where the backup would be allowed to run initially.
> >
>


[jira] [Created] (IGNITE-9365) Force backups to different AWS availability zones using only Spring XML

2018-08-23 Thread David Harvey (JIRA)
David Harvey created IGNITE-9365:


 Summary: Force backups to different AWS availability zones using 
only Spring XML
 Key: IGNITE-9365
 URL: https://issues.apache.org/jira/browse/IGNITE-9365
 Project: Ignite
  Issue Type: Improvement
  Components: cache
 Environment:  
Reporter: David Harvey
Assignee: David Harvey
 Fix For: 2.7


As a developer, I want to be able to force  cache backups each to a different 
"Availability Zone", when I'm running out-of-the-box Ignite, without additional 
Jars installed.  "Availability zone" is a AWS feature with different names for 
the same function by other cloud providers.  A single availability zone has the 
characteristic that some or all of the EC2 instances in that zone can fail 
together due to a single fault.   You have no control over the hosts on which 
the EC2 instance VMs run on in AWS, except by controlling the availability zone 
.  
 
I could write a few lines of a custom affinityBackupFilter, and configure it a 
RendezvousAffinityFunction, but then I have to get it deployed on all nodes in 
the cluster, and peer class loading will not work to this.   The code to do 
this should just be part of Ignite. 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


affinityBackupFilter for AWS Availability Zones

2018-08-23 Thread David Harvey
I need an affinityBackupFilter that will prevent backups from running in
the same AWS availability zone.  (A single availability zone has the
characteristic that some or all of the EC2 instances in that zone can fail
together due to a single fault.   You have no control over the hosts on
which the EC2 instance VMs run on in AWS, except by controlling the
availability zone) .

I could write a few lines of custom code, but then I have to get it
deployed on all nodes in the cluster, and peer class loading will not
work.   So I cannot use an of the shelf docker image, for example.   So
that code should just be part of Ignite.

I was thinking of adding new class along these lines, where the apply
function will return true only if none of the node's attributes match those
of any of the nodes in the list.   This would become part of the code base,
but would only be used if configured as the backupAffinityFunction

ClusterNodeNoAttributesMatchBiPredicate implements
IgniteBiPredicate> {


ClusterNodeNoAttributesMatchBiPredicate(String[] attributeNames)
{}

For AvailabilityZones, there would be only one attribute examined, but we
have some potential use cases for distributing backups across two
sub-groups of an AZ.

Alternately, we could enhance the RendezvousAffinityFunction to allow one
or more arbitrary attributes to be compared  to determine neighbors,
rather  than only org.apache.ignite.macs, and to add a setting that
controls whether backups should be placed on neighbors if they can't be
placed anywhere else.

If I have 2 backups and three availability zones (AZ), I want one copy of
the data in each AZ.  If all nodes in one AZ fail, I want to be able to
decide to try to get to three copies anyway, increasing the per node
footprint by 50%, or to only run with one backup. This would also give
be a convoluted way to change  the number of backups of a cache
dynamically:Start the cache with a large number of backups, but don't
provide a location where the backup would be allowed to run initially.


New Contributor - IGNITE-7616

2018-08-08 Thread David Harvey
I've be working with ignite for almost a year, but haven't contributed
anything back yet.
IGNITE-7616 is annoying me, so I might as well just fix it.  My Jira ID is
syssoftsol.

Thanks,
-DH


[jira] [Created] (IGNITE-9026) Two levels of Peer class loading fails in CONTINUOUS mode

2018-07-17 Thread David Harvey (JIRA)
David Harvey created IGNITE-9026:


 Summary: Two levels of Peer class loading fails in CONTINUOUS mode
 Key: IGNITE-9026
 URL: https://issues.apache.org/jira/browse/IGNITE-9026
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: David Harvey


We had an seemingly functional system in SHARED_MODE, where we have a custom 
StreamReceiver that sometimes sends closures on the peer class loaded code to 
other servers.  However, we ended up running out of Metaspace, because we had > 
6000 class loaders!  We suspected a regression in this change 
[https://github.com/apache/ignite/commit/d2050237ee2b760d1c9cbc906b281790fd0976b4#diff-3fae20691c16a617d0c6158b0f61df3c],
 so we switched to CONTINUOUS mode.    We then started getting failures to load 
some of the classes for the closures on the second server.   Through some 
testing and code inspection, there seems to be the following flaws between 
GridDeploymentCommunication.sendResourceRequest and its two callers.

The callers iterate though all the participant nodes until they find an online 
node that responds to the request (timeout is treated as offline node), with 
either success or failure, and then the loop terminates.  The assumption is 
that all nodes are equally capable of providing the resource, so if one fails, 
then the others would also fail.   

The first flaw is that GridDeploymentCommunication.sendResourceRequest() has a 
check for a cycle, i.e., whether the destination node is one of the nodes that 
originated or forwarded this request, and in that case,  a failure response is 
faked.   However, that causes the caller's loop to terminate.  So depending on 
the order of the nodes in the participant list,  sendResourceRequest() may fail 
before trying any nodes because it has one of the calling nodes on this list.   
   It should instead be skipping any of the calling nodes.

Example with 1 client node a 2 server nodes:  C1 sends data to S1, which 
forwards closure to S2.   C1 also sends to S2 which forwards to S1.  So now the 
node lists on S1 and S2 contain C1 and the other S node.   If the order of the 
node lists on S1 is (S2,C1) and on S2 (S1,C1), then when S1 tries to load a 
class, it will try S2, then S2 will try S1, but will get a fake failure 
generated, causing S2 not to try more nodes (i.e., C1), and causing S1 also not 
to try more nodes.

The other flaw is the assumption that all participants have equal access to the 
resource.   Assume S1 knows about userVersion1 via S3 and S4, with S3 though C1 
and S4 through C2.   If C2 fails, then S4 is not capable of getting back to a 
master, but S1 has no way of knowing that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7905) Setting userVersion in ignite.xml causes ignite.active(true) to fail

2018-03-08 Thread David Harvey (JIRA)
David Harvey created IGNITE-7905:


 Summary: Setting userVersion in ignite.xml causes 
ignite.active(true) to fail
 Key: IGNITE-7905
 URL: https://issues.apache.org/jira/browse/IGNITE-7905
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
 Environment: [^ignite.xml]
Reporter: David Harvey
 Attachments: ignite.xml

On priority, I can't tell if this is an issue solely with 
ignite.active(boolean) when the userVersion or much more wide spead.   If the 
former, then this is only a Minor bug.

The userVersion should invalidate peer class loaded classes loaded from the 
client to servers, but changing the userVersion cause parts of ignite not to 
work at all, at least ignite.active(true).

I'm using the vanilla docker  image as the server, and running the 
StreamVisitorExample from Eclipse.

I create examples/targets/classes/META-INF/ignite.xml, with a userVersion of 3 
(attached).   I can run still StreamVisitorExample.  

Then I added ignite.active(true) to StreamVisitorExample, and it fails:

            if (!ExamplesUtils.hasServerNodes(ignite))

                return;

            ignite.active(true);

 

Caused by: class org.apache.ignite.IgniteDeploymentException: Task was not 
deployed or was redeployed since task execution 
[taskName=org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 
taskClsName=org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 codeVer=3, clsLdrId=96512270261-22a69ca3-ead9-4db6-98fe-af95310e7d93, 
seqNum=1520538097001, depMode=SHARED, dep=null]

Then I change userVersion in ignite.xml to "0" and it succeeds.

The error on the server side is:

[19:50:33,183][WARNING][pub-#11635][GridDeploymentManager] Failed to deploy 
class in SHARED or CONTINUOUS mode for given user version (class is locally 
deployed for a different user version) 
[cls=o.a.i.i.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 localVer=0, otherVer=3]

[19:50:33,183][SEVERE][pub-#11635][GridJobProcessor] Task was not deployed or 
was redeployed since task execution 
[taskName=o.a.i.i.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 
taskClsName=o.a.i.i.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 codeVer=3, clsLdrId=25a0a270261-003c826e-2094-4fa9-9862-988583e6a18e, 
seqNum=1520538618450, depMode=SHARED, dep=null]

class org.apache.ignite.IgniteDeploymentException: Task was not deployed or was 
redeployed since task execution 
[taskName=org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 
taskClsName=org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest,
 codeVer=3, clsLdrId=25a0a270261-003c826e-2094-4fa9-9862-988583e6a18e, 
seqNum=1520538618450, depMode=SHARED, dep=null]

 at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1160)

 at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1913)

 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)

 at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)

 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)

 at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)

 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 at java.lang.Thread.run(Thread.java:748)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-6344) AWS AMI startup.sh gets spurious error on export statment

2017-09-11 Thread David Harvey (JIRA)
David Harvey created IGNITE-6344:


 Summary: AWS AMI startup.sh gets spurious error on export statment
 Key: IGNITE-6344
 URL: https://issues.apache.org/jira/browse/IGNITE-6344
 Project: Ignite
  Issue Type: Bug
  Components: aws
Reporter: David Harvey
Priority: Minor


The export statement in this script should look like
export "$p"
rather than
export $p
or removed altogether

If something with blanks is specified, like
JVM_OPTS=-Xg1 -Xg2
the script reports errors because it splits the line on blanks.  However, the 
result of the export statement is not used, so the basic AMI works.   But it 
you are debugging another issue, you get very confused by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)