from:"Jeremy McMillan"

Re: rejecting communication connection & Failed to process selector key

2024-09-18 Thread Jeremy McMillan

I suspect your openshift networking is doing something wrong: NAT is
particularly suspicious.

Share your discovery configuration and openshift network layout.

On Mon, Sep 16, 2024 at 4:38 AM MJ <6733...@qq.com> wrote:

> Donot think so.  As below the remote ip 10.254.13.83
>  is the another server node.
> --- log
> Accepted incoming communication connection [locAddr=/10.254.32.162:47100,
> rmtAddr=/10.254.13.83:35160]
> super=GridNioSessionImpl [locAddr=/10.254.32.162:52542, rmtAddr=/
> 10.254.13.83:47100
> ---
>
> so the multiple connections kept being rejected were between two server
> nodes. Any scenarios could cause that ?  it appears to be that the original
> connection was shutdown or interrupted fast by one node but the other node
> was not aware of the connection close event, or was not informed ?  Any
> configuration can help on that ?
>
>
> Thanks,
> -MJ
>
> Original Email
>
> From:"Pavel Tupitsyn"< ptupit...@apache.org >;
>
> Sent Time:2024/9/16 12:58
>
> To:"user"< user@ignite.apache.org >;
>
> Subject:Re: rejecting communication connection & Failed to process
> selector key
>
> Looks like some non-Ignite application connects to the Ignite server, then
> sends unexpected data or disconnects quickly.
>
> Could it be some kind of a security tool, port scanner, or a misconfigured
> service somewhere on the network?
>
> On Mon, Sep 16, 2024 at 3:59 AM MJ <6733...@qq.com> wrote:
>
>> Hi Igniters,
>>
>>
>>
>> I am experiencing the “Failed to process selector key” error once every
>> one or two days. Every time it appears received and rejected multiple
>> communication connections and then threw the exception.
>>
>> Below logging is about “Broken pipe” original exception but not only
>> “Broken pipe”, occasionally the “Failed to process selector key” wraps
>> “Connection Reset”, “javax.net.ssl.SSLException: Failed to encrypt data
>> (SSL engine error) [status=CLOSED, handshakeStatus=NOT_HANDSHAKING”.
>>
>>
>>
>> Is there any solution to fix it ? or its configuration can be improved ?
>>
>>
>>
>> Ignite 2.16.0 / 4 data nodes, running in openshift 4
>>
>>
>>
>>  config of communicationSpi
>>
>> 
>>
>> > class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
>>
>> 
>>
>> > value="1024"/>
>>
>> > value="25000"/>
>>
>> > value="6"/>
>>
>> 
>>
>> 
>>
>>
>>
>>
>>
>> 24-09-15 17:18:35.146 [INFO ]
>> grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication
>> connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35160]
>>
>> 24-09-15 17:18:35.147 [INFO ]
>> grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when
>> already connected to this node, rejecting
>> [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40,
>> rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713]
>>
>> 24-09-15 17:18:35.357 [INFO ]
>> grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication
>> connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35162]
>>
>> 24-09-15 17:18:35.358 [INFO ]
>> grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when
>> already connected to this node, rejecting
>> [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40,
>> rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713]
>>
>> 24-09-15 17:18:35.568 [INFO ]
>> grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Accepted incoming communication
>> connection [locAddr=/10.254.32.162:47100, rmtAddr=/10.254.13.83:35164]
>>
>> 24-09-15 17:18:35.569 [INFO ]
>> grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:117 - Received incoming connection when
>> already connected to this node, rejecting
>> [locNode=52437bc3-3dfe-4f76-bec6-d2f22f8a5d40,
>> rmtNode=7c28b6bc-8991-47a2-b69c-6adba0482713]
>>
>> 24-09-15 17:18:35.975 [ERROR]
>> grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%
>> o.a.i.s.c.t.TcpCommunicationSpi:137 - Failed to process selector key
>> [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
>> [super=AbstractNioClientWorker [idx=1, bytesRcvd=29406013584, bytesSent=0,
>> bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
>> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi,
>> finished=false, heartbeatTs=1726435114873, hashCode=1144648384,
>> interrupted=false,
>> runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]],
>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>> inRecovery=GridNioRecoveryDescriptor [acked=20129536, resendCnt=0,
>> rcvCnt=19533551, sentCnt=20129879, reserv

Re: Failed to execute query because cache partition has been lost

2024-09-18 Thread Jeremy McMillan

If you want to do maintenance, and you want to block access during
maintenance, inactivate the cluster, then do the maintenance, then activate
the cluster.

I recommend that you tell the community what you are trying to do, and then
ask with an open mind how the community would accomplish that goal.

Think carefully, like a database: if you had been tracking changes to data
on a remote node, and that node disappeared and reappeared, is it safe to
assume nothing bad has happened to that node or its data while it was not
available? Is it fair to others for you to assert that data is trustworthy?
Lost partitions must be recovered in the default case. If you want unsafe
behavior, configure the cluster to ignore lost partitions.

On Wed, Sep 18, 2024 at 4:56 AM  wrote:

> Ok, thanks, I understand.
>
> But in this case, if someone tries to modify the DB while a node is down,
> does Ignites offers any mechanism to prevent this or should I implement it?
>
>
>
>
>
> *From:* Pavel Tupitsyn 
> *Sent:* miércoles, 18 de septiembre de 2024 11:30
> *To:* user@ignite.apache.org
> *Subject:* Re: Failed to execute query because cache partition has been
> lost
>
>
>
> > 2 servers and 1 client, and no backups
>
> > shut down one node
>
>
>
> There are no backups => any node shutdown leads to partition loss.
>
> If you want to ignore data loss, set partitionLossPolicy = IGNORE [1]
>
>
>
> [1]
> https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy
> 
>
>
>
> On Wed, Sep 18, 2024 at 12:04 PM  wrote:
>
> Hi.
>
>
>
> We are using Apache Ignite in our application, and currently, we are
> testing the behaviour of the system when there are system errors.
>
>
>
> One of our tests is not working as expected:
>
>- we have got an Ignite cluster with 2 servers and 1 client, and no
>backups
>- Ignite version 2.16
>- We shut down one node server for several minutes
>- During this time there is no read nor write to Ignite (we do not use
>the DB)
>- When we restart the server node, we expect to recover the system
>smoothly BUT we have exceptions when we query the data: “Failed to execute
>query because cache partition has been lost”
>
>
>
> We can resolve the problem resetting the lost partitions, but is this a
> normal behaviour of Ignite? I mean, it is a simple case, and the node
> should be able to join the cluster without problems.
>
>
>
> Thank you.
>
>
>
>

Re: Rolling Update

2024-09-09 Thread Jeremy McMillan

An operator as I understand it, is just a pod that interacts with your
application and Kubernetes API server as necessary to do what you might be
doing manually.

https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
https://kubernetes.io/docs/reference/using-api/client-libraries/

You might start by creating an admin-pod with Ignite control.sh,
sqlline.sh, thin client, etc. tools PLUS kubectl or some other Kubernetes
API client that you can exec into and manually perform all of the rolling
update steps. Once you know you have all the tools and steps complete, you
can try adding scripts to the pod to automate sequences of steps. Then once
the scripts are fairly robust and complete, you can use the admin-pod as a
basis for Kubernetes Job definitions. It's up to you whether you'd like to
continue integrating with Kubernetes further. Next steps would be to create
a CustomResourceDefinition instead of using Kubernetes Job, or
writing/adding a Kubernetes compatible API that does what your Job command
line startup does, but with more control over parameters.

Please share your results once you've got things working. Best of luck!

On Fri, Sep 6, 2024 at 10:15 AM Humphrey  wrote:

> Thanks for the explanation, is there any operator ready for use? Is it
> hard to create own Operator if it doesn’t exist yet?
>
> Thanks
>
> On 5 Sep 2024, at 19:39, Jeremy McMillan  wrote:
>
> 
> It is correct for an operator, but not correct for readiness probe. It's
> not your understanding of Ignite metrics. It is your understanding of
> Kubernetes.
> Kubernetes rolling update logic assumes all of your service backend nodes
> are completely independent, but you have chosen a readiness probe which
> reflects how nodes are interacting and interdependent.
>
> Hypothetically:
>   We have bounced one node, and it has rejoined the cluster, and is
> rebalancing.
>   If Kubernetes probes this node for readiness, we fail because we are
> rebalancing. The scheduler will block progress of the rolling update.
>   If Kubernetes probes any other node for readiness, it will fail because
> we are rebalancing. The scheduler will remove this node from any services.
>   All the nodes will reflect the state of the cluster: rebalancing.
>   No nodes will remain in the service backend. If you are using the
> Kubernetes discovery SPI, the restarted node will find itself unable to
> discover any peers.
>
> The problem is that Kubernetes interprets the readiness probe as a NODE
> STATE. The cluster.rebalanced metric is a CLUSTER STATE.
>
> If you had a Kubernetes job that executes Kubectl commands from within the
> cluster, looping over the pods in a StatefulSet and restarting them, it
> would make perfect sense to check cluster.rebalanced and block until
> rebalancing finishes, but Kubernetes does something different with
> readiness probes based on some assumptions about clustering which do not
> apply to Ignite.
>
> On Thu, Sep 5, 2024 at 11:29 AM Humphrey Lopez  wrote:
>
>> Yes I’m trying to read the cluster.rebalanced metric from the JMX mBean,
>> is that the correct one? I’ve build that into the readiness endpoint from
>> actuator and let kubernetes wait for the cluster to be ready before move to
>> the next pod.
>>
>> Humphrey
>>
>> On 5 Sep 2024, at 17:34, Jeremy McMillan  wrote:
>>
>> 
>> I assume you have created your caches/tables with backups>=1.
>>
>> You should restart one node at a time, and wait until the restarted node
>> has rejoined the cluster, then wait for rebalancing to begin, then wait for
>> rebalancing to finish before restarting the next node. Kubernetes readiness
>> probes aren't sophisticated enough. "Node ready" state isn't the same thing
>> as "Cluster ready" state, but Kubernetes scheduler can't distinguish. This
>> should be handled by an operator, either human, or a Kubernetes automated
>> one.
>>
>> On Tue, Sep 3, 2024 at 1:13 PM Humphrey  wrote:
>>
>>> Thanks, I meant Rolling Update of the same version of Ignite (2.16). Not
>>> upgrade to a new version. We have our ignite embedded in Spring Boot
>>> application, and when changing code we need to deploy new version of the
>>> jar.
>>>
>>> Humphrey
>>>
>>> On 3 Sep 2024, at 19:24, Gianluca Bonetti 
>>> wrote:
>>>
>>> 
>>> Hello
>>>
>>> If you want to upgrade Apache Ignite version, this is not supported by
>>> Apache Ignite
>>>
>>> "Ignite cluster cannot have nodes that run on different Ignite versions.
>>> You need to stop the cluster and start it again on the new Ignite version.&quo

Re: Rolling Update

2024-09-05 Thread Jeremy McMillan

It is correct for an operator, but not correct for readiness probe. It's
not your understanding of Ignite metrics. It is your understanding of
Kubernetes.
Kubernetes rolling update logic assumes all of your service backend nodes
are completely independent, but you have chosen a readiness probe which
reflects how nodes are interacting and interdependent.

Hypothetically:
  We have bounced one node, and it has rejoined the cluster, and is
rebalancing.
  If Kubernetes probes this node for readiness, we fail because we are
rebalancing. The scheduler will block progress of the rolling update.
  If Kubernetes probes any other node for readiness, it will fail because
we are rebalancing. The scheduler will remove this node from any services.
  All the nodes will reflect the state of the cluster: rebalancing.
  No nodes will remain in the service backend. If you are using the
Kubernetes discovery SPI, the restarted node will find itself unable to
discover any peers.

The problem is that Kubernetes interprets the readiness probe as a NODE
STATE. The cluster.rebalanced metric is a CLUSTER STATE.

If you had a Kubernetes job that executes Kubectl commands from within the
cluster, looping over the pods in a StatefulSet and restarting them, it
would make perfect sense to check cluster.rebalanced and block until
rebalancing finishes, but Kubernetes does something different with
readiness probes based on some assumptions about clustering which do not
apply to Ignite.

On Thu, Sep 5, 2024 at 11:29 AM Humphrey Lopez  wrote:

> Yes I’m trying to read the cluster.rebalanced metric from the JMX mBean,
> is that the correct one? I’ve build that into the readiness endpoint from
> actuator and let kubernetes wait for the cluster to be ready before move to
> the next pod.
>
> Humphrey
>
> On 5 Sep 2024, at 17:34, Jeremy McMillan  wrote:
>
> 
> I assume you have created your caches/tables with backups>=1.
>
> You should restart one node at a time, and wait until the restarted node
> has rejoined the cluster, then wait for rebalancing to begin, then wait for
> rebalancing to finish before restarting the next node. Kubernetes readiness
> probes aren't sophisticated enough. "Node ready" state isn't the same thing
> as "Cluster ready" state, but Kubernetes scheduler can't distinguish. This
> should be handled by an operator, either human, or a Kubernetes automated
> one.
>
> On Tue, Sep 3, 2024 at 1:13 PM Humphrey  wrote:
>
>> Thanks, I meant Rolling Update of the same version of Ignite (2.16). Not
>> upgrade to a new version. We have our ignite embedded in Spring Boot
>> application, and when changing code we need to deploy new version of the
>> jar.
>>
>> Humphrey
>>
>> On 3 Sep 2024, at 19:24, Gianluca Bonetti 
>> wrote:
>>
>> 
>> Hello
>>
>> If you want to upgrade Apache Ignite version, this is not supported by
>> Apache Ignite
>>
>> "Ignite cluster cannot have nodes that run on different Ignite versions.
>> You need to stop the cluster and start it again on the new Ignite version."
>> https://ignite.apache.org/docs/latest/installation/upgrades
>>
>> If you need rolling upgrades you can upgrade to GridGain which bring
>> rolling upgrades together with many other interesting features
>> "Rolling Upgrades is a feature of GridGain Enterprise and Ultimate
>> Edition that allows nodes with different GridGain versions to coexist in a
>> cluster while you roll out a new version. This prevents downtime when
>> performing software upgrades."
>> https://www.gridgain.com/docs/latest/installation-guide/rolling-upgrades
>>
>> Cheers
>> Gianluca Bonetti
>>
>> On Tue, 3 Sept 2024 at 18:15, Humphrey Lopez  wrote:
>>
>>> Hello, we have several pods with ignite caches running in kubernetes. We
>>> only use memory mode (not persistence) and want to perform rolling update
>>> of without losing data. What metric should we monitor to know when it’s
>>> safe to replace the next pod?
>>>
>>> We have tried the Cluser.Rebalanced (1) metric from JMX in a readiness
>>> probe but we still end up losing data from the caches.
>>>
>>> 1)
>>> https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#cluster
>>>
>>> Should we use another mechanism or metric for determining the readiness
>>> of the new started pod?
>>>
>>>
>>> Humphrey
>>>
>>

Re: Rolling Update

2024-09-05 Thread Jeremy McMillan

I assume you have created your caches/tables with backups>=1.

You should restart one node at a time, and wait until the restarted node
has rejoined the cluster, then wait for rebalancing to begin, then wait for
rebalancing to finish before restarting the next node. Kubernetes readiness
probes aren't sophisticated enough. "Node ready" state isn't the same thing
as "Cluster ready" state, but Kubernetes scheduler can't distinguish. This
should be handled by an operator, either human, or a Kubernetes automated
one.

On Tue, Sep 3, 2024 at 1:13 PM Humphrey  wrote:

> Thanks, I meant Rolling Update of the same version of Ignite (2.16). Not
> upgrade to a new version. We have our ignite embedded in Spring Boot
> application, and when changing code we need to deploy new version of the
> jar.
>
> Humphrey
>
> On 3 Sep 2024, at 19:24, Gianluca Bonetti 
> wrote:
>
> 
> Hello
>
> If you want to upgrade Apache Ignite version, this is not supported by
> Apache Ignite
>
> "Ignite cluster cannot have nodes that run on different Ignite versions.
> You need to stop the cluster and start it again on the new Ignite version."
> https://ignite.apache.org/docs/latest/installation/upgrades
>
> If you need rolling upgrades you can upgrade to GridGain which bring
> rolling upgrades together with many other interesting features
> "Rolling Upgrades is a feature of GridGain Enterprise and Ultimate Edition
> that allows nodes with different GridGain versions to coexist in a cluster
> while you roll out a new version. This prevents downtime when performing
> software upgrades."
> https://www.gridgain.com/docs/latest/installation-guide/rolling-upgrades
>
> Cheers
> Gianluca Bonetti
>
> On Tue, 3 Sept 2024 at 18:15, Humphrey Lopez  wrote:
>
>> Hello, we have several pods with ignite caches running in kubernetes. We
>> only use memory mode (not persistence) and want to perform rolling update
>> of without losing data. What metric should we monitor to know when it’s
>> safe to replace the next pod?
>>
>> We have tried the Cluser.Rebalanced (1) metric from JMX in a readiness
>> probe but we still end up losing data from the caches.
>>
>> 1)
>> https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#cluster
>>
>> Should we use another mechanism or metric for determining the readiness
>> of the new started pod?
>>
>>
>> Humphrey
>>
>

Re: Does the JDBC thin driver support partition aware execution of INSERT statements?

2024-08-29 Thread Jeremy McMillan

Thanks Pavel!

That is really cool, but looks like it only works in very carefully managed
situations.

Do you happen to know off the top of your head

A) the JDBC thin driver client needs to have a complete list of node
addresses.
  Q: will it collect IPs from DNS hostnames that return multiple A or 
records for each node?

B) Q: does it only work for single row INSERT statements, or will it also
work for batched INSERT containing many rows worth of VALUES?

On Thu, Aug 29, 2024 at 12:27 AM Pavel Tupitsyn 
wrote:

> JDBC driver does support partition awareness [1]
> And it works for INSERT statements too, as I understand [2]
>
> > When a query is executed for the first time, the driver receives the
> partition distribution for the table
> > that is being queried and saves it for future use locally.
> > When you query this table next time, the driver uses the partition
> distribution
> > to determine where the data being queried is located to send the query
> to the right nodes.
>
> [1]
> https://ignite.apache.org/docs/latest/SQL/JDBC/jdbc-driver#partition-awareness
> [2]
> https://ignite.apache.org/docs/latest/SQL/JDBC/jdbc-driver#partitionAwarenessSQLCacheSize
>
> On Wed, Aug 28, 2024 at 11:24 PM Jeremy McMillan  wrote:
>
>> Probably not in the way you might expect from the question. From the
>> documentation:
>> "The driver connects to one of the cluster nodes and forwards all the
>> queries to it for final execution. The node handles the query distribution
>> and the result’s aggregations. Then the result is sent back to the client
>> application."
>>
>> The JDBC client has a persistent connection to one cluster node, to which
>> all queries are sent. The JDBC client does not connect to multiple nodes to
>> handle multiple INSERTs.
>>
>> On Wed, Aug 28, 2024 at 3:45 AM 38797715 <38797...@qq.com> wrote:
>>
>>> Does the JDBC thin driver support partition aware execution of INSERT
>>> statements?
>>>
>>

Re: Does the JDBC thin driver support partition aware execution of INSERT statements?

2024-08-28 Thread Jeremy McMillan

Probably not in the way you might expect from the question. From the
documentation:
"The driver connects to one of the cluster nodes and forwards all the
queries to it for final execution. The node handles the query distribution
and the result’s aggregations. Then the result is sent back to the client
application."

The JDBC client has a persistent connection to one cluster node, to which
all queries are sent. The JDBC client does not connect to multiple nodes to
handle multiple INSERTs.

On Wed, Aug 28, 2024 at 3:45 AM 38797715 <38797...@qq.com> wrote:

> Does the JDBC thin driver support partition aware execution of INSERT
> statements?
>

Re: Ignite Thick Client running Node Filters???

2024-08-23 Thread Jeremy McMillan

I think you might be looking for events.

https://ignite.apache.org/docs/latest/events/listening-to-events#enabling-events
https://ignite.apache.org/docs/latest/events/events#cluster-state-changed-events

On Fri, Aug 23, 2024 at 11:59 AM Gregory Sylvain 
wrote:

> Hi,
>
> Thanks for the reply.
>
> I was looking down this road, however, everything is automated and the
> cluster is activated by a script when all ServerNodes are in the baseline.
>
> Is there a hook that can be called when the cluster is activated to do
> this work?
>
> Thanks.
> Greg
>
>
> On Fri, Aug 23, 2024 at 12:29 PM Jeremy McMillan  wrote:
>
>> The example in the documentation explaining nodeFilter uses node
>> attributes as a condition, but the logic might include dynamic node state
>> like performance metrics to decide whether to run a service or not.
>>
>> It seems like the behavior you want/expect might be implemented better
>> using clusterGroup
>> https://ignite.apache.org/docs/2.15.0/services/services#clustergroup
>>
>> You would need to do something like (
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cluster/ClusterGroup.html
>> )
>>
>> // Cluster group over all nodes that have the user attribute "group" set to 
>> the value "worker".
>>  ClusterGroup workerNodes = cluster.forAttribute("group", "worker");
>>
>> .. and then start services with this
>>
>> https://ignite.apache.org/releases/2.15.0/javadoc/org/apache/ignite/IgniteSpringBean.html#services-org.apache.ignite.cluster.ClusterGroup-
>>
>> On Fri, Aug 23, 2024 at 8:35 AM Gregory Sylvain 
>> wrote:
>>
>>> Hi Igniters,
>>>
>>> I'm running Ignite 2.15 cluster with native persistence enabled running
>>> on RHEL 8 VMs.
>>>
>>> I am running a Service cluster of 5 ServerNodes and 32 Thick Clients.
>>>
>>> Each Service has a User Attribute that indicates which service to run.
>>> Each ServerNode sets two User Attributes - indicating it should run two
>>> services.
>>>
>>> When the Cluster starts up from nothing, it sets the BLT and starts all
>>> services as expected.
>>>
>>> After the BLT is set, the cluster ports are opened (via firewalld) to
>>> allow the clients to connect to the cluster and start utilizing the
>>> services offered.
>>>
>>> If, after this point, a BLT cluster node restarts and drops out of the
>>> cluster and then re-joins, the Node Filter's apply() method is invoked on
>>> all ServerNodes *and *Thick Clients!
>>>
>>>
>>>- Q1: Why is a Node Filter running on a Thick Client and can I
>>>disable this?
>>>
>>>
>>> So, if a Node Filter is invoked on a Thick Client and it gets passed a
>>> ClusterNode representing a ServerNode that should run a specific service,
>>> the filter should return *true*, according to the API.  However, I do
>>> not want Clients to run services.
>>>
>>>
>>>
>>>- Q2: Can I limit the Node Filter invocations to only BLT nodes (or
>>>at least only Server Nodes) ?
>>>- Q3: If Node Filters are intended to run on Thick Clients as well,
>>>can I just return false from the apply method and how does that affect 
>>> the
>>>semantics of service balancing that I am trying to achieve?
>>>
>>>
>>>
>>>
>>> Thanks in advance,
>>> Greg
>>>
>>>
>>> --
>>>
>>> *Greg Sylvain*
>>>
>>> Software Architect/Lead Developer on XOComm
>>>
>>> Booz | Allen | Hamilton
>>>
>>>
>>>
>>> sylvain_greg...@bah.com
>>>
>>> cell: 571.236.8951
>>>
>>> ofc: 703.633.3195
>>>
>>> Chantilly, VA
>>>
>>>
>>>
>>

Re: Ignite Thick Client running Node Filters???

2024-08-23 Thread Jeremy McMillan

The example in the documentation explaining nodeFilter uses node attributes
as a condition, but the logic might include dynamic node state like
performance metrics to decide whether to run a service or not.

It seems like the behavior you want/expect might be implemented better
using clusterGroup
https://ignite.apache.org/docs/2.15.0/services/services#clustergroup

You would need to do something like (
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cluster/ClusterGroup.html
)

// Cluster group over all nodes that have the user attribute "group"
set to the value "worker".
 ClusterGroup workerNodes = cluster.forAttribute("group", "worker");

.. and then start services with this
https://ignite.apache.org/releases/2.15.0/javadoc/org/apache/ignite/IgniteSpringBean.html#services-org.apache.ignite.cluster.ClusterGroup-

On Fri, Aug 23, 2024 at 8:35 AM Gregory Sylvain 
wrote:

> Hi Igniters,
>
> I'm running Ignite 2.15 cluster with native persistence enabled running on
> RHEL 8 VMs.
>
> I am running a Service cluster of 5 ServerNodes and 32 Thick Clients.
>
> Each Service has a User Attribute that indicates which service to run.
> Each ServerNode sets two User Attributes - indicating it should run two
> services.
>
> When the Cluster starts up from nothing, it sets the BLT and starts all
> services as expected.
>
> After the BLT is set, the cluster ports are opened (via firewalld) to
> allow the clients to connect to the cluster and start utilizing the
> services offered.
>
> If, after this point, a BLT cluster node restarts and drops out of the
> cluster and then re-joins, the Node Filter's apply() method is invoked on
> all ServerNodes *and *Thick Clients!
>
>
>- Q1: Why is a Node Filter running on a Thick Client and can I disable
>this?
>
>
> So, if a Node Filter is invoked on a Thick Client and it gets passed a
> ClusterNode representing a ServerNode that should run a specific service,
> the filter should return *true*, according to the API.  However, I do not
> want Clients to run services.
>
>
>
>- Q2: Can I limit the Node Filter invocations to only BLT nodes (or at
>least only Server Nodes) ?
>- Q3: If Node Filters are intended to run on Thick Clients as well,
>can I just return false from the apply method and how does that affect the
>semantics of service balancing that I am trying to achieve?
>
>
>
>
> Thanks in advance,
> Greg
>
>
> --
>
> *Greg Sylvain*
>
> Software Architect/Lead Developer on XOComm
>
> Booz | Allen | Hamilton
>
>
>
> sylvain_greg...@bah.com
>
> cell: 571.236.8951
>
> ofc: 703.633.3195
>
> Chantilly, VA
>
>
>

Re: Query regarding Apache ignite open source

2024-08-21 Thread Jeremy McMillan

It isn't clear exactly what you're asking in any of these questions. If you
want a guided introduction to Apache Ignite, maybe you should try to attend
a free training workshop. This should prepare you to navigate the
documentation and enable you to answer your own questions as they arise.

https://www.gridgain.com/services/gridgain-apache-ignite-training

If you'd like the Ignite community to answer in this thread, please
describe a short story for each need explaining what you mean by "backup"
and "servers" and "sync or async", "updates in the cluster" and "Entry
processor."

I suspect there are conventional Ignite ways of dealing with your concerns,
but you may be bringing terminology from another domain which doesn't match
exactly how Ignite behavior is usually explained.

On Tue, Aug 20, 2024 at 5:40 AM Mahesh yadavalli <
mahesh.yadaval...@gmail.com> wrote:

> Thanks for the response.
> I am looking into Apache ignite for our caching needs and specifically few
> features like
> 1.  backup servers configurable to be sync or async.
> 2. Write behind updates in the cluster
> 3. Entry processor
>
>
> Are the above features part  of open source?
>
> Is there a way to know which feature is open source and which is not ?
>
>
> On Tue, Aug 20, 2024  3:43 PM Stephen Darlington 
> wrote:
>
>> Ignite has the Apache 2.0 Licence (
>> https://github.com/apache/ignite/blob/master/LICENSE) which is an
>> approved "open source" licence (https://opensource.org/license/apache-2-0
>> ).
>>
>> There are distributions of Ignite with more restrictive licences, and
>> they may have additional features or different release schedules.
>>
>> On Tue, 20 Aug 2024 at 11:06, Mahesh yadavalli <
>> mahesh.yadaval...@gmail.com> wrote:
>>
>>> Hi,
>>> I would like to know if Apache ignite is completely open source. If not,
>>> what features are not covered in the free/community version?
>>>
>>> Thank you!
>>>
>>

Re: Ignite H2 to Calcite issues

2024-08-14 Thread Jeremy McMillan

Amit:

I'm concerned that you may be misreading the CVE details in the ticket you
cited, since you indicated you are moving TO H2.. Also the stack trace is a
Calcite stack trace. This is ambiguous whether this is the before
(persistence config changes) depicted or after changing persistence
depicted.

A) The CVEs cited in the ticket
https://issues.apache.org/jira/browse/IGNITE-15241 are all H2
vulnerabilities.
B) The H2 vulnerabilities cited all involve behaviors of H2 that Ignite
does not use, therefore Ignite is affected neither by Calcite nor H2
persistence involvement.

I don't want to discourage you from moving from H2 to Calcite, but maybe
this isn't as urgent as it appears, so please proceed carefully. As Alex
requested, it will be helpful for the community to see which queries
produce exceptions and which ones do not. H2 and Calcite have different SQL
parsers and query planners and underlying implementations, so it should not
be surprising that queries might need rework in the course of switching.
You should expect to encounter issues like this one, and others like it.
It's a migration effort.


On Tue, Aug 13, 2024 at 9:17 AM Amit Jolly  wrote:

> Hi,
>
> We are trying to switch to H2 due to Security Vulnerabilities as listed in
> JIRA https://issues.apache.org/jira/browse/IGNITE-15241
>
> We are seeing below errors post switching. We are just running select *
> From table query.
>
> Caused by: java.lang.ClassCastException: class
> org.apache.ignite.internal.binary.BinaryObjectImpl cannot be cast to class
> java.lang.Comparable (org.apache.ignite.internal.binary.BinaryObjectImpl is
> in unnamed module of loader 'app'; java.lang.Comparable is in module
> java.base of loader 'bootstrap')
> at
> org.apache.ignite.internal.processors.query.calcite.exec.exp.ExpressionFactoryImpl.compare(ExpressionFactoryImpl.java:223)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.exp.ExpressionFactoryImpl.access$100(ExpressionFactoryImpl.java:85)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.exp.ExpressionFactoryImpl$1.compare(ExpressionFactoryImpl.java:157)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> java.base/java.util.Map$Entry.lambda$comparingByKey$6d558cbf$1(Map.java:560)
> ~[?:?]
> at
> java.base/java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:660)
> ~[?:?]
> at java.base/java.util.PriorityQueue.siftUp(PriorityQueue.java:637) ~[?:?]
> at java.base/java.util.PriorityQueue.offer(PriorityQueue.java:330) ~[?:?]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.Inbox.pushOrdered(Inbox.java:239)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.Inbox.push(Inbox.java:201)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.Inbox.onBatchReceived(Inbox.java:177)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.ExchangeServiceImpl.onMessage(ExchangeServiceImpl.java:324)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.ExchangeServiceImpl.lambda$init$2(ExchangeServiceImpl.java:195)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.message.MessageServiceImpl.onMessageInternal(MessageServiceImpl.java:276)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.message.MessageServiceImpl.lambda$onMessage$0(MessageServiceImpl.java:254)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
> at
> org.apache.ignite.internal.processors.query.calcite.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:66)
> ~[ignite-calcite-2.16.0.jar:2.16.0]
>
> Any idea what might be causing this?
>
> Regards,
>
> Amit Jolly
>

Re: Requesting information about known CVEs related to Apache Ignite 2.16

2024-08-02 Thread Jeremy McMillan

There is an example of and gentle introduction to building Ignite out of
Maven dependencies in the Ignite Essentials workshop. You could enroll in
an upcoming free training session or access the same material on your own
schedule for free on university.gridgain.com. The Ignite quick start
documentation provides a little more detail and a starting point for
further exploration.

https://ignite.apache.org/docs/latest/quick-start/java

What I would strongly recommend avoiding is trying to embed Ignite classes
into another applications' JVMs which do not provide stable infrastructure.
Good performance relies, among other things, partly upon a stable cluster
topology. How you build the nodes doesn't matter as much as where and how
you instantiate/run them.

Re: Requesting information about known CVEs related to Apache Ignite 2.16

2024-08-02 Thread Jeremy McMillan

Apache Ignite release notes contain details about fixes including CVEs
addressed.
https://github.com/apache/ignite/blob/master/RELEASE_NOTES.txt

Current known vulnerabilities are determined by vulnerability testing,
which differs depending on who (test/scan tool vendor, stakeholder/user)
does the testing. All scanner tools are different, and most support
configurable policy around what to recognize and what to report. GridGain
performs security audits of commercial distributions, but the Ignite
community is responsible to perform its own testing.

Some public vulnerability scan reports are available. YMMV:
https://security.snyk.io/package/maven/org.apache.ignite:ignite-core

On Thu, Aug 1, 2024 at 7:53 PM Vishy Ramaswamy 
wrote:

> Hi All,
> We are trying out Apache Ignite version 2.16.0. I want to know where I can
> get information about what vulnerabilities (CVE) got addressed in 2.16.0 as
> well as what are the current known vulnerabilities on 2.16 (if any).
> Appreciate the help and thanks in advance for your response
>
> Vishy
>
>
> Vishy Ramaswamy
> Modernization Architect  |  Workload Automation
> Mainframe Division | Broadcom
>
> mobile: +1.236.638.9672
>
> CAN-British Columbia Remote Location
>
> vishy.ramasw...@broadcom.com   |   broadcom.com
>
> This electronic communication and the information and any files
> transmitted with it, or attached to it, are confidential and are intended
> solely for the use of the individual or entity to whom it is addressed and
> may contain information that is confidential, legally privileged, protected
> by privacy laws, or otherwise restricted from disclosure to anyone else. If
> you are not the intended recipient or the person responsible for delivering
> the e-mail to the intended recipient, you are hereby notified that any use,
> copying, distributing, dissemination, forwarding, printing, or copying of
> this e-mail is strictly prohibited. If you received this e-mail in error,
> please return the e-mail to the sender, delete it from your computer, and
> destroy any printed copy of it.

Re: [support] ignite tuning help

2024-06-21 Thread Jeremy McMillan

Also, I didn't look at your network trace screen cap, but you should have
zero TCP retransmissions if you set your initial TCP window send window
small enough.

https://www.auvik.com/franklyit/blog/tcp-window-size/

On Fri, Jun 21, 2024, 07:53 Jeremy McMillan  wrote:

> It could be network or persistent storage. What's the proportion of fast
> to slow gets?
>
>
> On Thu, Jun 20, 2024, 22:48 f cad  wrote:
>
>> here is a screenshot example
>> [image: image.png]
>>
>> f cad  于2024年6月21日周五 11:45写道：
>>
>>> Hello, community:
>>>
>>> I have a cluster ,with three nodes.
>>> I have two cache, that AtomicityMode is TRANSACTIONAL and Backups number is 
>>> two and WriteSynchronizationMode is PRIMARY_SYNC
>>>
>>>
>>> I use IgniteClientSpringTransactionManagerwith OPTIMISTIC transaction and 
>>> SERIALIZABLE concurrency mode.
>>>
>>> pseudocode like below
>>> ignite.transactions().txStart
>>> if(acahce.get(key)==null) {
>>>  aCahce.put(key,value)
>>>  bCahce.put(key,value)
>>> }
>>> tx.commit()
>>>
>>> Sometimes I find aCahce.get(key) costs 80ms ,Sometimes that costs only 5ms.
>>> and three ignite nodes usage of cpu and io and memory all not high.
>>> and client node usage of cpu and io and memory all not high.
>>> but I use tcpdump to find that Between nodes, there are over 40 TCP 
>>> retransmissions per second.
>>> So is this a network issue?
>>>
>>>

Re: [support] ignite tuning help

2024-06-21 Thread Jeremy McMillan

It could be network or persistent storage. What's the proportion of fast to
slow gets?


On Thu, Jun 20, 2024, 22:48 f cad  wrote:

> here is a screenshot example
> [image: image.png]
>
> f cad  于2024年6月21日周五 11:45写道：
>
>> Hello, community:
>>
>> I have a cluster ,with three nodes.
>> I have two cache, that AtomicityMode is TRANSACTIONAL and Backups number is 
>> two and WriteSynchronizationMode is PRIMARY_SYNC
>>
>>
>> I use IgniteClientSpringTransactionManagerwith OPTIMISTIC transaction and 
>> SERIALIZABLE concurrency mode.
>>
>> pseudocode like below
>> ignite.transactions().txStart
>> if(acahce.get(key)==null) {
>>  aCahce.put(key,value)
>>  bCahce.put(key,value)
>> }
>> tx.commit()
>>
>> Sometimes I find aCahce.get(key) costs 80ms ,Sometimes that costs only 5ms.
>> and three ignite nodes usage of cpu and io and memory all not high.
>> and client node usage of cpu and io and memory all not high.
>> but I use tcpdump to find that Between nodes, there are over 40 TCP 
>> retransmissions per second.
>> So is this a network issue?
>>
>>

Re: Best way to update and organize nodes

2024-05-30 Thread Jeremy McMillan

This could work if you set up availability zones and use backup filters.
Then you could perform maintenance one entire AZ at a time. When running
during maintenance, your workload might exceed the capacity of the fraction
of server nodes remaining up, so beware that.



On Thu, May 30, 2024, 11:30 Louis C  wrote:

> Hello everyone,
>
>
> I had a question that I could not really answer reading the documentation :
> Let's say I have a cluster of 10 Ignite server nodes, with one cache with
> persistent data and 2 data backups.
>
> I want to update the different nodes while maintaining the cluster
> activity (answering the clients requests). To do so I can stop gracefully
> one node, update it, and restart it, and then take care of the following
> nodes in the same fashion.
> In my understanding, this should ensure that no data is lost and that the
> cluster is still active (is this really the case ?).
> But this is quite long.
>
> I wanted to know if it was possible to set the different partitions in
> such a way that we know that we can shutdown half (or 1/3) of the nodes in
> the same time, to speed up this process.
> I guess it would be as if we have 5 primary nodes and 5 backups nodes, and
> that the 5 backup nodes take over when the 5 primary nodes shut down.
>
>
> Is such a thing possible ?
>
> Best regards,
>
> Louis C.
>
>

Re: Node requires maintenance, non-empty set of maintainance tasks is found - node is not coming up

2024-05-29 Thread Jeremy McMillan

If backup partitions are available when a node is lost, we should not
expect lost partitions.

There is a lot more to this story than this thread explains, so for the
community: please don't follow this procedure.

https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy
"A partition is lost when both the primary copy and all backup copies of
the partition are not available to the cluster, i.e. when the primary and
backup nodes for the partition become unavailable."

If you attempt to access a cache and receive a lost partitions error, this
means there IS DATA LOSS. Partition loss means there are no primary or
backup copies of a particular cache partition available. Have multiple
server nodes experienced trouble? Can we be certain that the affected
caches were created with backups>=1?

If a node fails to start up, and complains about maintenance tasks, we
should be very suspicious this node's persistent data is corrupted. If the
cluster is activated with a missing node and caches have lost partitions,
then we know these caches have lost some data. If there are no lost
partitions, we can safely remove the corrupted node from the baseline and
bring up a fresh node, and add it to the baseline to replace it thus
restoring redundancy. If there are lost partitions and we need to reset
lost partitions to bring a cache back online, we should expect that cache
is missing some data and may need to be reloaded.

Cache configuration backups=2 is excessive except in edge cases. For
backups=n, the memory and persistence footprint cost is n+1 times the
nominal data footprint. This scales linear. The marginal utility we derive
from each additional backup copy is diminishing because for a probability
of any single node failure p or p/1, the likelihood of needing those extra
copies is p/(n+1) for n backup copies.

Think of backup partitions like multiple coats of paint. After the second
coat, nobody will be able to tell the difference if you applied a third or
fourth coat. It still takes the same effort and materials to apply each
coat of paint.

If you NEED fault tolerance, then it should be mandatory to conduct testing
to make sure the configuration you have chosen is working as expected. If
backups=1 isn't effective for single node failures, then backups=2 will
make no beneficial difference. With backups=1 we should expect a cache to
work without complaining about lost partitions when one server node is
offline.

On Wed, May 29, 2024 at 12:15 PM Naveen Kumar 
wrote:

> Thanks very much for your prompt response Gianluca
>
> just for the community, I could solve this by running the control.sh with
> reset lost partitions for individual cachereset_lost_partitions
> looks like it worked, those partition issue is resolved, I suppose there
> wouldnt be any data loss as we have set all our caches with 2 replicas
>
> coming to the node which was not getting added to the cluster earlier,
> removed from baseline --> cleared all persistence store --> brought up the
> node --> added the node to baseline, this also seems to have worked fine.
>
> Thanks
>
>
> On Wed, May 29, 2024 at 5:13 PM Gianluca Bonetti <
> gianluca.bone...@gmail.com> wrote:
>
>> Hello Naveen
>>
>> Apache Ignite 2.13 is more than 2 years old, 25 months old in actual fact.
>> Three bugfix releases had been rolled out over time up to 2.16 release.
>>
>> It seems you are restarting your cluster on a regular basis, so you'd
>> better upgrade to 2.16 as soon as possible.
>> Otherwise it will also be very difficult for people on a community based
>> mailing list, on volunteer time, to work out a solution with a 2 years old
>> version running.
>>
>> Besides that, you are not providing very much information about your
>> cluster setup.
>> How many nodes, what infrastructure, how many caches, overall data size.
>> One could only guess you have more than 1 node running, with at least 1
>> cache, and non-empty dataset. :)
>>
>> This document from GridGain may be helpful but I don't see the same for
>> Ignite, it may still be worth checking it out.
>>
>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/maintenance-mode
>>
>> On the other hand you should also check your failing node.
>> If it is always the same node failing, then there should be some root
>> cause apart from Ignite.
>> Indeed if the nodes configuration is the same across all nodes, and just
>> this one fails, you should also consider some network issues (check
>> connectivity and network latency between nodes) and hardware related issues
>> (faulty disks, faulty memory)
>> In the end, one option might be to replace the faulty machine with a
>> brand new one.
>> In cloud environments this is actually quite cheap and easy to do.
>>
>> Cheers
>> Gianluca
>>
>> On Wed, 29 May 2024 at 08:43, Naveen Kumar 
>> wrote:
>>
>>> Hello All
>>>
>>> We are using Ignite 2.13.0
>>>
>>> After a cluster restart, one of the node is not coming up and in node
>>> logs are seeing this error - Nod

Re: Possible too long JVM pause - Ignite 2.10

2024-05-09 Thread Jeremy McMillan

Finding happiness is unfortunately never quite that simple.

   1. Understand why the garbage collector cannot function with shorter
   pauses.
   (may require GC logging configuration to provide key details)
   2. Identify priorities.
   (ie. absolute minimal GC pauses for best latency performance, or maximum
   throughput, or minimal hardware footprint/cost...)
   3. Choose a remediation solution based on stated priorities.
   (ie. any combination of increase RAM, or possibly ironically CPU or
   network capacity, decrease query workload, tune GC parameters, ...)
   4. Implement the solution with appropriate changes to hardware, code,
   configuration, and command line options, etc.

Ignite tends to use Java heap mostly for handling query workload. The
slower these queries are, the greater number of them will be running
concurrently. Java heap needs to accommodate the sum of all running
queries' memory footprints, so the first remediation option on the list
should include making the slowest queries faster or less memory-hungry.
Alternatively, these queries could receive more server resources to spread
the load thinner, putatively by adding more nodes to the cluster. This will
divide the query load up, and also provide additional resources at the same
time. Node resource levels may also be upgraded to help the queries
complete faster if analysis reveals they are CPU bound or memory bound.
Only when we know the workload and resource level are properly matched
should we experiment with GC tuning options.

On Thu, May 9, 2024 at 1:31 AM Charlin S  wrote:

> Hi All,
>
> I am getting Possible too long JVM pause: 6403 milliseconds. JVM options
> used as below
> -XX:+DisableExplicitGC,-XX:+UseG1GC,-Xms3g,-Xmx5g - client node 1
> -XX:+DisableExplicitGC,-XX:+UseG1GC,-Xms1g,-Xmx4g  - client node 2
>
> Please suggest this.jvm option to avoid JVM pause issue.
>
> Thanks & Regards,
> Charlin
>
>
>
>
>
>
>

Re: Turning off deadlock detection

2024-03-24 Thread Jeremy McMillan

https://ignite.apache.org/docs/latest/key-value-api/transactions#deadlock-detection

The property you're asking about is for diagnostics to enable prevention of
future deadlocks. Turning it off is fine if you already know or can find
out another way what is deadlocked and why.

On Sat, Mar 23, 2024 at 12:50 PM Ronak Shah 
wrote:

> Ping again. Can someone answer please? - Ronak
>
> On Mon, Mar 18, 2024 at 12:04 PM Ronak Shah 
> wrote:
>
>> Hi Ignite users,
>> I am hitting a situation in a scaled environment where if the transaction
>> timeouts for whatever reason, the cluster goes in a deadlock detection
>> mode, where it is taking even more time to holding up lock and creating
>> snowball effect on queued up transactions before going in a completely dead
>> state.
>>
>> I am wondering if it is safe to deadlock detection to be turned off by
>> making
>> IGNITE_TX_DEADLOCK_DETECTION_MAX_ITERS to 0
>>
>> Based on the guide, this detection is only for the bookkeeping and it is
>> safe to turned off.
>>
>> Any guidance on that would be greatly appreciated.
>>
>> Ronak
>>
>

Re: Data loss in an Ignite application

2024-02-21 Thread Jeremy McMillan

First, logging should be configured to at least WARN level if not INFO.

Ignite manages data internally at the page level. If you see errors about
pages, it is low, low level ignite problems. The next level up is
partitions. Errors involving partitions are mid low level ignite problems.
The next level up is caches. Errors at the cache level are mid to high
level problems. The next level is cache records. Errors in cache record
handling are high level of abstraction, and the next level is client
application operations.

The lower level of abstraction the errors appear, the less chance
operations in general will succeed. Since the cache appears to operate
mostly as expected, and there are no obvious errors in the ignite logs,
most likely there is some client side logic which is deleting records, and
ignite does not consider this behavior to be in error.

I would recommend fine tuning cache delete method log coverage. First
identify if the deletion is happening on a client connection thread pool or
a thread for server initiated operations.

My guess is that a client is connecting, getting a cache object, and then
setting expiration on that cache connection so that all cache adds under
that cache connection will have expiration applied to them.

https://ignite.apache.org/docs/2.14.0/configuring-caches/expiry-policies#configuration

"You can also change or set Expiry Policy for individual cache operations.
This policy is used for each operation invoked on the returned cache
instance."

https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Client.Cache.ICacheClient-2.html?q=withExpiryPolicy#Apache_Ignite_Core_Client_Cache_ICacheClient_2_WithExpiryPolicy_Apache_Ignite_Core_Cache_Expiry_IExpiryPolicy_

On Wed, Feb 21, 2024, 19:17 Aleksej Avrutin  wrote:

> Hello,
>
> A couple of days ago I encountered a strange phenomenon in our application
> based on Apache Ignite .Net 2.14 with persistence (3 nodes, 1 backup per
> cache).
> Data in a cache started disappearing for seemingly no reason and the
> amount of records could be halved (220K to 108K) overnight. I spent a
> couple of days trying to find a problem in the application, crunched
> hundreds megabytes of application logs but didn't manage to find a reason
> to blame the application. Retention/TTL is not set for the cache. Apache
> Ignite logs with the option -DIGNITE_QUIET=false also don't reveal any
> anomalies (or I don't know what to look for). The data shares are expected
> to be durable (based on Azure Disk) and we never had any issues with them.
> RAM utilisation is normal and there's plenty of available RAM.
> The Ignite cluster is hosted in a 3 node Kubernetes cluster on Azure.
>
> The question is: how would you recommend investigating issues like this?
> What metrics and logs can I check? Is it possible to log and track
> individual Remove() operations as well as SQL queries at Ignite engine
> level?
>
> The application has been working on Ignite for years already and we didn't
> encounter data loss at such scales before. It's possible that the app
> wasn't used so extensively before as it is now and the problem left
> unnoticed.
>
> My best,
> Alex Avrutin
>

Re: ignite + external database

2024-02-06 Thread Jeremy McMillan

2. If you want the Ignite cluster to be authoritative about caches, then
you should define them in the XML configuration or deploy your servers with
code which can look up the intended cache configurations and implement
them. If you have specific ideas how you would like to implement this,
maybe you could write your own extension? If the idea is really good, it
has a strong possibility of adoption by the community which maintains
Ignite.

5. Cache metadata must be persisted as a side effect of persisting the data
across reboots. We have many customers using external persistence in their
enterprises, however we always recommend GridGain over Ignite when business
needs are more important than freedom to experiment. Please have a look at
how metadata is persisted for use with native persistence, and maybe you
could use that as a basis to experiment with persistent in-memory-only
cache definitions. Most use cases relying on dynamic caches are designed
around the cache client having authority over what is cached. I urge you to
think about cases when the client using a dynamic cache disagrees with the
servers or other clients about what and how data should be cached as
persisting cache metadata on the server from client asserted dynamic caches
creates duplicate declarations of intent about cache configuration.

On Tue, Feb 6, 2024 at 2:37 PM Нестрогаев Андрей Викторович <
a.nestrog...@flexsoft.com> wrote:

> 2. I understood, You are talking about placing the cache in a persistent
> data region and at the same time enabling the read-through, write-through
> caches mode.
>
>
>
> Thank you. This is, of course, a double penalty, but I’ll think about it
> and test this mode.
>
>
>
> 5. It seems that Ignite has refocused on native persistence, and less
> attention is paid to the functions of working in reliable cache mode. It
> seems to me that the ability to use any other persister through SPI
> instead of a native one, like many other things have been done, would be a
> cool opportunity. So far, the use of external databases looks very
> unenterprise ready
>
>
>
> Andrey
>
>
>
> *From:* Нестрогаев Андрей Викторович
> *Sent:* Tuesday, February 6, 2024 10:03 PM
> *To:* user@ignite.apache.org
> *Subject:* RE: ignite + external database
>
>
>
> Thanks Stephen,
>
>
>
> 2. «You can, actually, enable persistence and connect to a third-party
> data store» - is this feature is not the same as using “external database”?
> coude you please give the link to the documentation
>
>
>
> 5. Ignite seems to have to know which partitions are lost, and in theory
> it doesn’t cost him anything to place these partitions on the remaining
> nodes (rebalancing) and execute the loadCache for lost partitions.
>
>
>
> Andrey
>
>
>
> *From:* Stephen Darlington 
> *Sent:* Tuesday, February 6, 2024 9:09 PM
> *To:* user@ignite.apache.org
> *Subject:* Re: ignite + external database
>
>
>
>1. With a memory-only cluster, Ignite does not store any persistent
>information. You'd need to save your table definitions somewhere yourself
>2. If it saved data, it would be a database rather than an in-memory
>data grid! You can, actually, enable persistence and connect to a
>third-party data store. It would, however, persistent the data, not just
>the metadata
>3. Data is rebalanced between Ignite nodes
>4. Assuming you're using the built-in JDBC Cache Store implementation,
>it basically does a "SELECT * FROM table" on each node and discards any
>data that should be stored elsewhere
>5. You'd get "lost partitions." Restoring the missing data is up to
>you. Kind of by definition, Ignite has lost some data and it doesn't know
>what it's missing
>
>
>
> On Tue, 6 Feb 2024 at 16:09, Нестрогаев Андрей Викторович <
> a.nestrog...@flexsoft.com> wrote:
>
> Hi All, I'm trying to use ignite (2.16) as an In-memory data grid
> (read-through / write-through caches), i.e. case described here
> https://ignite.apache.org/use-cases/in-memory-data-grid.html
>
>
>
> Several questions arose:
> 1. How is it recommended to store metadata for caches created dynamically
> during the life of the cluster so that they survive a complete reboot of
> the cluster?
> 2. Why can’t ignite save this cache metadata, just as it saves information
> about the base topology when we configure default data region to use
> persistence?
> 3. If a new node is joined to the base topology, how does rebalancing
> occur: is some data moved from other nodes, or are caches on the new node
> loaded from the database?
> 4. How does the initial loading of the partitioned cache from the database
> occur: each node loads they data it itself, or loads the node on which
> loadCache was initiated, or something else?
> 5. If both the primary node and backup node “died” at the same time, how
> will the cluster be restored and data loaded from the database?
>
>
>
> Thanks for the help in advance.
>
>
>
> Andrey.
>
> *Настоящее  сообщение или любые приложения к нему мо

Re: Info about time series support

2024-01-05 Thread Jeremy McMillan

To answer the OP question, maybe linear regression is sufficient for making
predictions in your data.

Ignite isn't really designed for exploratory data analysis, so it really
helps to understand the character of your data. Linear models are usually a
good place to start. Does a regression line make sense if you plot one on
one of your time value columns in a spreadsheet? If so, you may not need to
deploy any additional libraries.

https://ignite.apache.org/docs/latest/machine-learning/regression/linear-regression

On Fri, Jan 5, 2024, 08:25 Stephen Darlington 
wrote:

> Normally we recommend using thin-clients if you can. Though, in this case,
> using a thick-client makes your life easier. Thick clients can deploy Java
> code for you.
>
> There are a few different ways to do it. The "easy" option is to just
> deploy the JAR files to the server nodes "manually." You could also
> consider peer class loading (
> https://ignite.apache.org/docs/2.11.1/code-deployment/peer-class-loading),
> which is where the client automatically sends classes to the remote nodes.
> Or UriDeployment (
> https://ignite.apache.org/docs/2.11.1/code-deployment/deploying-user-code),
> where Ignite copies the Jar files from a central location. GridGain's
> Control Center (not open source) is also able to deploy code.
>
> On Fri, 5 Jan 2024 at 14:04, Angelo Immediata  wrote:
>
>> hello Gianluca and all
>>
>> Regarding to thin client, in my architecture I avoided to use thin
>> clients; I'm using thick clients; so if python is supported only in "thin
>> client" mode, I'd prefer to avoid it
>>
>> Regarding distributed computing, I didn't see it but it seems to be
>> interesting but something is missing me. Let's suppose I want to use djl
>> https://djl.ai/ and its timeseries support (
>> https://djl.ai/extensions/timeseries/) I can use the distributed
>> computing; as far as I understood the distributed computing allows to me to
>> distribute computations across all my cluster nodes. Now I'm using thick
>> clients, this means my java application is remotely connected to the apache
>> ignite "master nodes"; in distributed computing I should execute the
>> computation on master nodes but if I use a custom dependency (e.g. djl) how
>> can these master remote nodes execute the computation if they don't have
>> the libraries?
>> Am I missing anything?
>>
>> Thank you
>> Angelo
>>
>> Il giorno ven 5 gen 2024 alle ore 14:24 Gianluca Bonetti <
>> gianluca.bone...@gmail.com> ha scritto:
>>
>>> Hello Jagat
>>>
>>> There are Ignite thin clients for a number of languages, including
>>> Python.
>>> For a full list of functionalities and comparison, please always refer
>>> to the official documentation.
>>>
>>> https://ignite.apache.org/docs/latest/thin-clients/getting-started-with-thin-clients
>>>
>>> All thin clients should perform around the same in tasks such as storing
>>> and retrieving data as they use the Apache Ignite binary protocol.
>>> As you know performance also varies case by case, because of different
>>> setups, configurations, and versions of software/frameworks/libraries
>>> being used, and of course the performance of the code that you will write
>>> yourself.
>>>
>>> For my specific use cases, Apache Ignite always performed extremely well.
>>> As I don't know anything about your project, there are far too many
>>> possible variables to be able to reduce to a yes/no answer.
>>> The advice is to run your own benchmarks on your infrastructure to get
>>> some meaningful figures for your specific project and infrastructure.
>>>
>>> Cheers
>>> Gianluca Bonetti
>>>
>>> On Fri, 5 Jan 2024 at 12:40, Jagat Singh  wrote:
>>>
 Hi Gianluna,

 Does the Python client miss any functionality or performance compared
 to Java?

 Thanks

 On Fri, 5 Jan 2024 at 15:55, Gianluca Bonetti <
 gianluca.bone...@gmail.com> wrote:

> Hello Angelo
>
> It seems to be an interesting use case for Ignite.
>
> However, you should consider what Ignite is, and what is not.
> Essentially, Ignite is a distributed in-memory
> database/cache/grid/etc...
> It also has some distributed computing API capabilities.
>
> You can store data easily in Ignite, and consume data by your code
> written in Java.
> You can also use Python since there is a Python Ignite Client
> available if it makes your time series analysis easier.
> You can also use the Ignite Computing API to execute code on your
> cluster
> https://ignite.apache.org/docs/latest/distributed-computing/distributed-computing
> but in this case I think Python is not supported.
>
> Cheers
> Gianluca Bonetti
>
> On Fri, 5 Jan 2024 at 08:52, Angelo Immediata 
> wrote:
>
>> I'm pretty new to Apache Ignite
>>
>>
>> I asked this also on stackoverflow (
>> https://stackoverflow.com/questions/77667648/apache-ignite-time-series-forecasting)
>> but I received no answer
>>

Re: India Scala & Big Data Job Referral

2023-12-21 Thread Jeremy McMillan

It might help if you address the Ignite community with a question about
Apache Ignite or GridGain skills and experience and needs. It might
demonstrate your skills if you participate in the community by answering
others' questions about Apache Ignite or GridGain, as it will establish
your authority on those subjects. Searching the archives of this list might
identify leads for you, based on questions you can cover with your
expertise, but please reach out directly, not via the mailing list.

It might be inappropriate to discuss things unrelated to Apache Ignite in
this forum, so please be considerate, as you would address friends.

On Thu, Dec 21, 2023 at 2:39 AM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi Community,
>
> I was laid off from Apple in February 2023, which led to my relocation
> from the USA due to immigration issues related to my H1B visa.
>
>
> I have over 12 years of experience as a consultant in Big Data, Spark,
> Scala, Python, and Flink.
>
>
> Despite my move to India, I haven't secured a job yet. I am seeking
> referrals within product firms (preferably non-consulting) in India that
> work with Flink, Spark, Scala, Big Data, or in the fields of ML & AI. Can
> someone assist me with this?
>
> Thanks
> Sri
>

Re: Another replicated cache oddity

2023-11-22 Thread Jeremy McMillan

with
> the key struct.
>
> So far it seems like the three failing nodes just temporarily 'forgot'
> they had this element, and remembered it again after the restart.
>
> For context, this is the first time we have seen this specific issue on a
> system that has been running in production for 2+ years now. We have seen
> numerous instances with replicated caches where Ignite has (permanently)
> failed to write at least one, but not all, copies of the element where grid
> restarts do not correct the issue. This does not feel the same though.
>
> Raymond.
>
>
>
>
>
> On Thu, Nov 23, 2023 at 6:50 AM Jeremy McMillan <
> jeremy.mcmil...@gridgain.com> wrote:
>
>> I suspect a race condition with async mode caches. This is a naive guess
>> though, as we don't have enough details. I'll assume this is a plea for
>> help in troubleshooting methodology and the question is really "what should
>> we look at next?"
>>
>> The real answer comes from tracing the insert of element E and subsequent
>> cache get() failures. Do we know if E was completely inserted into each
>> replicated cache backup partition prior to the get()? Do we know if the
>> reported cache get() failure was actually a fully functioning cache lookup
>> and retrieval that failed during lookup, or were there timeouts or
>> exceptions indicating something abnormal was happening?
>>
>> What steps did you take to troubleshoot this issue, and what is the
>> cluster and cache configuration in play? What does the code look like for
>> the updates to the replicated cache, and what does the code look like for
>> the distributed compute operation?
>>
>> On Tue, Nov 21, 2023 at 5:21 PM Raymond Wilson <
>> raymond_wil...@trimble.com> wrote:
>>
>>> Hi,
>>>
>>> We have been triaging an odd issue we encountered in a system using
>>> Ignite v2.15 and the C# client.
>>>
>>> We have a replicated cache across four nodes, lets call them P0, P1, P2
>>> & P3. Because the cache is replicated every item added to the cache is
>>> present in each of P0, P1, P2 and P3.
>>>
>>> Some time ago an element (E) was added to this cache (among many
>>> others). A number of system restarts have occurred since that time.
>>>
>>> We started observing an issue where a query running across P0/P1/P2/P3
>>> as a cluster compute operation needed to load element E on each of the
>>> nodes to perform that query. Node P0 succeeded, while nodes P1, P2 & P3 all
>>> reported that element E did not exist.
>>>
>>> This situation persisted until the cluster was restarted, after which
>>> the same query that had been failing now succeeded as all four 'P' nodes
>>> were able to read element E.
>>>
>>> There were no Ignite errors reported in the context of these
>>> failing queries to indicate unhappiness in the Ignite nodes.
>>>
>>> This seems like very strange behaviour. Are there any suggestions as to
>>> what could be causing this failure to read the replicated value on the
>>> three failing nodes, especially as the element 'came back' after a cluster
>>> restart?
>>>
>>> Thanks,
>>> Raymond.
>>>
>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> raymond_wil...@trimble.com
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>

Re: Another replicated cache oddity

2023-11-22 Thread Jeremy McMillan

I suspect a race condition with async mode caches. This is a naive guess
though, as we don't have enough details. I'll assume this is a plea for
help in troubleshooting methodology and the question is really "what should
we look at next?"

The real answer comes from tracing the insert of element E and subsequent
cache get() failures. Do we know if E was completely inserted into each
replicated cache backup partition prior to the get()? Do we know if the
reported cache get() failure was actually a fully functioning cache lookup
and retrieval that failed during lookup, or were there timeouts or
exceptions indicating something abnormal was happening?

What steps did you take to troubleshoot this issue, and what is the cluster
and cache configuration in play? What does the code look like for the
updates to the replicated cache, and what does the code look like for the
distributed compute operation?

On Tue, Nov 21, 2023 at 5:21 PM Raymond Wilson 
wrote:

> Hi,
>
> We have been triaging an odd issue we encountered in a system using Ignite
> v2.15 and the C# client.
>
> We have a replicated cache across four nodes, lets call them P0, P1, P2 &
> P3. Because the cache is replicated every item added to the cache is
> present in each of P0, P1, P2 and P3.
>
> Some time ago an element (E) was added to this cache (among many others).
> A number of system restarts have occurred since that time.
>
> We started observing an issue where a query running across P0/P1/P2/P3 as
> a cluster compute operation needed to load element E on each of the nodes
> to perform that query. Node P0 succeeded, while nodes P1, P2 & P3 all
> reported that element E did not exist.
>
> This situation persisted until the cluster was restarted, after which the
> same query that had been failing now succeeded as all four 'P' nodes were
> able to read element E.
>
> There were no Ignite errors reported in the context of these
> failing queries to indicate unhappiness in the Ignite nodes.
>
> This seems like very strange behaviour. Are there any suggestions as to
> what could be causing this failure to read the replicated value on the
> three failing nodes, especially as the element 'came back' after a cluster
> restart?
>
> Thanks,
> Raymond.
>
>
>
>
> --
> 
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> 
>

Re: Failed to process selector key

2023-11-14 Thread Jeremy McMillan

ntCnt=321662, reserved=true, lastAck=345504,
> nodeLeft=false, node=TcpDiscoveryNode
> [id=c45e0d94-8cb4-4e23-8ff3-29f573117b58,
> consistentId=c45e0d94-8cb4-4e23-8ff3-29f573117b58, addrs=ArrayList
> [client_ip 127.0.0.1], sockAddrs=null, discPort=0, order=27, intOrder=27,
> lastExchangeTime=1699691909333, loc=false,
> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true,
> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> closeSocket=true,
> outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
> super=GridNioSessionImpl [locAddr=/server_ip:47100,
> rmtAddr=/client_ip:34662, createTime=1699700035875, closeTime=0,
> bytesSent=5758138431, bytesRcvd=47248795615, bytesSent0=195008,
> bytesRcvd0=1614751, sndSchedTime=1699937197520, lastSndTime=1699956690669,
> lastRcvTime=1699956690699, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=o.a.i.i.util.nio.GridDirectParser@7141a1d9, directMode=true],
> GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]]
>
> On the client side I don't see any errors happening around that time, I
> have also searched for warnings, but nothing.
>
> I've seen this post where they are also using the streaming api and
> getting similar errors.
>
> https://lists.apache.org/thread/jgf2jrp231jd5rhdbh7f5sb8gnclocl8
>
> My guess maybe it has to do with the datastreamer somehow?
>
> Humphrey
>
> Op ma 13 nov 2023 om 21:10 schreef Jeremy McMillan <
> jeremy.mcmil...@gridgain.com>:
>
>> These errors look like something which does not speak Ignite binary
>> protocol is connecting and sending useless stuff to your Ignite cluster.
>>
>> IgniteException: Invalid message type: 2057
>>
>>
>> Check the configuration of the client if the host generating this traffic
>> is known, and check firewalls or monitoring tools if not.
>>
>> On Mon, Nov 13, 2023 at 8:04 AM Humphrey Lopez 
>> wrote:
>>
>>> Other errors we are seeing:
>>>
>>> Failed to read message [msg=null, buf=java.nio.DirectByteBuffer[pos=2
>>> lim=162 cap=32768], reader=DirectMessageReader [state=DirectMessageState
>>> [pos=0, stack=[StateItem [stream=DirectByteBufferStreamImplV2
>>> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0,
>>> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1,
>>> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0,
>>> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], StateItem
>>> [stream=DirectByteBufferStreamImplV2 [baseOff=140381476511056, arrOff=-1,
>>> tmpArrOff=0, valReadBytes=0, tmpArrBytes=0, msgTypeDone=false, msg=null,
>>> mapIt=null, it=null, arrPos=-1, keyDone=false, readSize=-1, readItems=0,
>>> prim=0, primShift=0, uuidState=0, uuidMost=0, uuidLeast=0, uuidLocId=0],
>>> state=0], StateItem [stream=DirectByteBufferStreamImplV2
>>> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0,
>>> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1,
>>> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0,
>>> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], null, null, null, null,
>>> null, null, null]], protoVer=3, lastRead=true],
>>> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
>>> [super=AbstractNioClientWorker [idx=1, bytesRcvd=6506344847,
>>> bytesSent=5800573007, bytesRcvd0=5461705, bytesSent0=197830, select=true,
>>> super=GridWorker [name=grid-nio-worker-tcp-comm-1,
>>> igniteInstanceName=TcpCommunicationSpi, finished=false,
>>> heartbeatTs=1699706651957, hashCode=2094994491, interrupted=false,
>>> runner=grid-nio-worker-tcp-comm-1-#48%TcpCommunicationSpi%]]],
>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
>>> readBuf=java.nio.DirectByteBuffer[pos=2 lim=162 cap=32768],
>>> inRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0,
>>> rcvCnt=53951, sentCnt=47247, reserved=true, lastAck=53920, nodeLeft=false,
>>> node=TcpDiscoveryNode [id=34cfcc64-d369-415b-b14f-6ac222087232,
>>> consistentId=34cfcc64-d369-415b-b14f-6ac222087232, addrs=ArrayList
>>> [xx.xxx.xx.xxx, 127.0.0.1], sockAddrs=null, discPort=0, order=24,
>>> intOrder=24, lastExchangeTime=1699691906215, loc=false,
>>> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true,
>>> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false],
>>> outRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0,
>>> rcvCnt=53951, sent

Re: Failed to process selector key

2023-11-13 Thread Jeremy McMillan

These errors look like something which does not speak Ignite binary
protocol is connecting and sending useless stuff to your Ignite cluster.

IgniteException: Invalid message type: 2057


Check the configuration of the client if the host generating this traffic
is known, and check firewalls or monitoring tools if not.

On Mon, Nov 13, 2023 at 8:04 AM Humphrey Lopez  wrote:

> Other errors we are seeing:
>
> Failed to read message [msg=null, buf=java.nio.DirectByteBuffer[pos=2
> lim=162 cap=32768], reader=DirectMessageReader [state=DirectMessageState
> [pos=0, stack=[StateItem [stream=DirectByteBufferStreamImplV2
> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0,
> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1,
> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0,
> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], StateItem
> [stream=DirectByteBufferStreamImplV2 [baseOff=140381476511056, arrOff=-1,
> tmpArrOff=0, valReadBytes=0, tmpArrBytes=0, msgTypeDone=false, msg=null,
> mapIt=null, it=null, arrPos=-1, keyDone=false, readSize=-1, readItems=0,
> prim=0, primShift=0, uuidState=0, uuidMost=0, uuidLeast=0, uuidLocId=0],
> state=0], StateItem [stream=DirectByteBufferStreamImplV2
> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0,
> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1,
> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0,
> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], null, null, null, null,
> null, null, null]], protoVer=3, lastRead=true],
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=6506344847,
> bytesSent=5800573007, bytesRcvd0=5461705, bytesSent0=197830, select=true,
> super=GridWorker [name=grid-nio-worker-tcp-comm-1,
> igniteInstanceName=TcpCommunicationSpi, finished=false,
> heartbeatTs=1699706651957, hashCode=2094994491, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#48%TcpCommunicationSpi%]]],
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
> readBuf=java.nio.DirectByteBuffer[pos=2 lim=162 cap=32768],
> inRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0,
> rcvCnt=53951, sentCnt=47247, reserved=true, lastAck=53920, nodeLeft=false,
> node=TcpDiscoveryNode [id=34cfcc64-d369-415b-b14f-6ac222087232,
> consistentId=34cfcc64-d369-415b-b14f-6ac222087232, addrs=ArrayList
> [xx.xxx.xx.xxx, 127.0.0.1], sockAddrs=null, discPort=0, order=24,
> intOrder=24, lastExchangeTime=1699691906215, loc=false,
> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true,
> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0,
> rcvCnt=53951, sentCnt=47247, reserved=true, lastAck=53920, nodeLeft=false,
> node=TcpDiscoveryNode [id=34cfcc64-d369-415b-b14f-6ac222087232,
> consistentId=34cfcc64-d369-415b-b14f-6ac222087232, addrs=ArrayList
> [xx.xxx.xx.xxx, 127.0.0.1], sockAddrs=null, discPort=0, order=24,
> intOrder=24, lastExchangeTime=1699691906215, loc=false,
> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true,
> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> closeSocket=true,
> outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
> super=GridNioSessionImpl [locAddr=/xx.xxx.xx.xx:47100,
> rmtAddr=/xx.xxx.xx.xxx:35492, createTime=1699700043744, closeTime=0,
> bytesSent=74190856, bytesRcvd=167712723, bytesSent0=0, bytesRcvd0=5260541,
> sndSchedTime=1699700043744, lastSndTime=1699706650143,
> lastRcvTime=1699706651957, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=o.a.i.i.util.nio.GridDirectParser@6c311b05, directMode=true],
> GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]]
> <#fb58e00b> o.a.i.IgniteException: Invalid message type: 2057 at
> o.a.i.i.m.c.IgniteMessageFactoryImpl.create(IgniteMessageFactoryImpl.java:133)
> at
> o.a.i.s.c.t.i.GridNioServerWrapper$2.create(GridNioServerWrapper.java:813)
> at o.a.i.i.u.n.GridDirectParser.decode(GridDirectParser.java:81) at
> o.a.i.i.u.n.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113)
> at
> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
> at
> o.a.i.i.u.n.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:133)
> at
> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
> at
> o.a.i.i.u.n.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3752)
> at
> o.a.i.i.u.n.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
> at
> o.a.i.i.u.n.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1379)
> at
> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2526)
> at
> o.a.i.i.u.n.GridNioServer$AbstractNio

Re: i meet java.lang.OutOfMemoryError: Direct buffer memory

2023-08-19 Thread Jeremy McMillan

Maybe force client to server connection or scale out your ignite cluster to
reduce the number of connections on each node. Each connection needs buffer
space of a multiple of the bandwidth delay product to keep data flowing
without stuttering.

On Fri, Aug 18, 2023, 21:43 f cad  wrote:

> Thanks for your suggestion
> we set  MaxDirectMemorySize to 8g,it still has exception.
> we use spark to read ignite.and use ThinIgniteClientOnK8s api to read.
> we find that if multiple spark applications read ignite, then ignite will
> hit the direct buffer memory oom.
> can you give me some suggestions about how  can i confirm that it is a
> network problem?
> Best Regards.
>
> Jeremy McMillan  于2023年8月18日周五 20:39写道：
>
>> This is most likely to happen when Ignite is fast and the network is
>> slow. It's unclear what's happening when you experience this error, so how
>> to fix it is also ambiguous.
>>
>> You could try increasing direct buffer memory in Java options. If that's
>> not sufficient share more about your infrastructure and workload and maybe
>> we can suggest something else.
>>
>> On Fri, Aug 18, 2023, 04:28 f cad  wrote:
>>
>>> hello,
>>> in my ignite cluster.i meet error below:
>>> JVM will be halted immediately due to the failure:
>>> [failureCtx=FailureContext [type=CRITICAL_ERROR,
>>> err=java.lang.OutOfMemoryError: Direct buffer memory]].
>>> how can i fix.
>>> ignite version is 2.14
>>>
>>

Re: i meet java.lang.OutOfMemoryError: Direct buffer memory

2023-08-18 Thread Jeremy McMillan

This is most likely to happen when Ignite is fast and the network is slow.
It's unclear what's happening when you experience this error, so how to fix
it is also ambiguous.

You could try increasing direct buffer memory in Java options. If that's
not sufficient share more about your infrastructure and workload and maybe
we can suggest something else.

On Fri, Aug 18, 2023, 04:28 f cad  wrote:

> hello,
> in my ignite cluster.i meet error below:
> JVM will be halted immediately due to the failure:
> [failureCtx=FailureContext [type=CRITICAL_ERROR,
> err=java.lang.OutOfMemoryError: Direct buffer memory]].
> how can i fix.
> ignite version is 2.14
>

Re: Random failure while data insertion in ignite cache

2023-07-14 Thread Jeremy McMillan

It would be appropriate for you to share any code and configuration to help
others recreate your system and the behavior you describe.

Without, many people could solve multiple problems, demonstrably, and yet
your problem is different enough that they are no help.

On Fri, Jul 14, 2023, 01:58 Abhishek Ubhe 
wrote:

> Hello,
>
> I am facing an issue for putting data in ignite caches. I have checked
> from start to end flow for data insertion but I haven't found anything here
> are scenarios I have checked and I find issues in this way :
>
>
>1. I added logs at every step from object formation to insertion in
>cache.
>2. All logs are present there without a single error, as per flow
>defined.
>3. But still I am unable to fetch data from cache. It is not inserted
>in the cache.
>
> *Note :  This issue is random and happened unexpectedly a few times in a
> month. I was unable to reproduce this issue.*
>
> *Please append for any suggestions or solutions.*
> --
> *Regards,*
> *Abhishek Ubhe*
>
>

Re: Ignite Server Node cluster

2023-07-11 Thread Jeremy McMillan

This will depend on how you are deploying your Spark workers. Whatever you
are doing to control Spark workers should be replicated to control startup
and shutdown of your Ignite nodes. Please start with the included ignite.sh
or ignite.bat scripts found in the bin folder of your Ignite distribution.

If you are using Kubernetes to deploy Spark, you may want to define pods as
a pair of containers running Spark worker and Ignite nodes, respectively.

It's a good idea to try a few things if you are not familiar with any of
them.

On Tue, Jul 11, 2023 at 6:15 AM Arunima Barik 
wrote:

> Hi all
>
> I want to start an Ignite server node on every Spark worker
> Also how to shutdown those nodes?
>
> Any methods to do so... I was trying to write a script but could not
> succeed.
>
> Any help is highly appreciated.
>
> Regards
> Arunima
>

Re: Ignite for Parquet files

2023-06-30 Thread Jeremy McMillan

Python doesn't at this time go anywhere near Ignite CacheStore. You would
need to implement the CacheStore in Java or some other language which
compiles to JVM runtime/jar. There's a talk from the most recent summit on
using Groovy, if you want a higher level language than Java, but
theoretically you could use Jython (if you are willing to experiment and
can find a compatible JVM that runs both Ignite and Jython).

Ignite can operate like a federated query proxy if different caches are
implemented with different external persistence for each cache. CacheStore
is the interface Ignite would use to send a cache miss to a backend
database. In your original question you intended to use Parquet files as a
backend database, but Ignite does not (yet) provide one for Parquet. If
someone were to donate a supportable Java implementation, I suspect the
community would adopt and support it. Since Parquet is columnar, I also
suspect it would need to target Ignite 3 to adopt conventions around
columnar data, and then might be backported to Ignite 2.

On Fri, Jun 30, 2023 at 12:13 PM Arunima Barik 
wrote:

> Which do you think would be a better option?
>
> Federated queries or CacheStore
>
> And is CacheStore supported in Python?
>
> On Fri, 30 Jun, 2023, 1:50 pm Stephen Darlington, <
> stephen.darling...@gridgain.com> wrote:
>
>> You’d need to implement your own Cache Store.
>> https://ignite.apache.org/docs/latest/persistence/custom-cache-store
>>
>> On 30 Jun 2023, at 06:46, Arunima Barik  wrote:
>>
>>
>> -- Forwarded message -
>> From: Arunima Barik 
>> Date: Fri, 30 Jun, 2023, 10:52 am
>> Subject: Ignite for Parquet files
>> To: 
>>
>>
>> Hello Team
>>
>> I have my data stored as parquet files. I want a caching layer on top of
>> this existing file system. I am going to use Ignite for that but I do not
>> need native persistence for that.
>>
>> I want that any changes to database should be reflected in both cache and
>> file.
>> And same for read queries. It should automatically read from disk if data
>> is not present in cache.
>>
>> I want to do all this is python. Please let me know how the same can be
>> done.
>> Resources if any as well.
>>
>> Thank you and looking forward to hearing from you.
>>
>> Regard,
>> Arunima Barik
>>
>>
>>

Re: How communication happens when using Multicast + Static IP finder

2023-06-18 Thread Jeremy McMillan

You will need to research the documentation on three subjects:

* Discovery
* Data distribution
* Client Cluster awareness

Much of what you seem to be asking is somewhat configurable, so the answer
is "it depends." Also, the question seems very broad, and you might benefit
from learning fundamentals. Maybe check out some of the content on
University.GridGain.com ?

One would need to make some architectural design decisions about an Ignite
cluster deployment before the network model can be tabulated. There are
tradeoffs to choose, and some flexibility, especially with AOP.

On Sun, Jun 18, 2023, 09:25 Vikas Vishwakarma  wrote:

> Hi All,
>
> I was wondering when using Multicast + Static IP finder, how communication
> about cache update happens, is it unicast or multicast and what all
> communication happen over multicast group and if there's any unicast
> communication then what all communication using unicast.
>
> Thank you
>

Re: Ignite thin client continuous query listener cannot listen to all events

2023-05-24 Thread Jeremy McMillan

Thanks for bringing this up!

https://ignite.apache.org/docs/latest/key-value-api/continuous-queries#events-delivery-guarantees

This sounds like you may have found a bug, but the details you've provided
are not sufficient to help others recreate and observe it for themselves,
and this effort needs to be recorded in a ticket. Would you be able to sign
up for a Jira account  and
detail steps to reproduce this behavior?

You may also want to research this:
  https://issues.apache.org/jira/browse/IGNITE-8035

On Mon, May 22, 2023 at 6:52 AM lonesomerain  wrote:

> *Hi,*
> *I have a question while using ignite 2.15.0*
>
> *Problem scenario:*
>
> Start the Ignite server of one node, start one thin client and create a
> continuous query listener, and then use 50 threads to add 500 data to the
> cache concurrently.
>
> *Problem phenomenon:*
>
> Through the information printed on the listener, it was found that the
> number of events listened to each time varies, possibly 496, 499 or 500...
>
> *Test Code:*
>
> public class StartServer {
>
> public static void main(String[] args) {
>
> Ignite ignite = Ignition.start();
>
> }
>
> }
>
>
> public class StartThinClient {
>
> public static void main(String[] args) throws InterruptedException {
>
> String addr = "127.0.0.1:10800";
>
>
> int threadNmu = 50;
>
>
> ClientConfiguration clientConfiguration = new
> ClientConfiguration();
>
> clientConfiguration.setAddresses(addr);
>
>
> IgniteClient client1 = Ignition.startClient(clientConfiguration);
>
>
> ClientCache cache1 =
> client1.getOrCreateCache("test");
>
>
> ContinuousQuery query = new ContinuousQuery<>();
>
> query.setLocalListener(new CacheEntryUpdatedListener Object>() {
>
> @Override
>
> public void onUpdated(Iterable>
> cacheEntryEvents) throws CacheEntryListenerException {
>
> Iterator> iterator =
> cacheEntryEvents.iterator();
>
> while (iterator.hasNext()) {
>
> CacheEntryEvent next = iterator.next();
>
> System.out.println("" + next.getKey());
>
> }
>
> }
>
> });
>
>
> cache1.query(query);
>
>
> IgniteClient client2 = Ignition.startClient(clientConfiguration);
>
> ClientCache cache2 = client2.cache("test");
>
>
> Thread[] threads = new Thread[threadNmu];
>
> for (int i = 0; i < threads.length; ++i) {
>
> threads[i] = new Thread(new OperationInsert(cache2, i, 500,
> threadNmu));
>
> }
>
> for (int i = 0; i < threads.length; ++i) {
>
> threads[i].start();
>
> }
>
> for (Thread thread : threads) {
>
> thread.join();
>
> }
>
>
> Thread.sleep(6);
>
>
> }
>
>
> static class OperationInsert implements Runnable {
>
>
> private ClientCache cache;
>
> private int k;
>
> private Integer test_rows;
>
> private Integer thread_cnt;
>
>
> public OperationInsert(ClientCache cache, int k,
> Integer test_rows, Integer thread_cnt) {
>
> this.cache = cache;
>
> this.k = k;
>
> this.test_rows = test_rows;
>
> this.thread_cnt = thread_cnt;
>
> }
>
>
> @Override
>
> public void run() {
>
> for (int i = 100 + (test_rows/thread_cnt) * k; i < 100
> + (test_rows/thread_cnt) * (k + 1); i++) {
>
> cache.put("" + i, "aaa");
>
> }
>
> }
>
> }
>
>
> }
>
>
> *Version:*
>
> The testing program uses Ignite version 2.15.0
>
> I attempted to insert data using one thread and did not observe any event
> loss. In addition, I also attempted an Ignite cluster with two or three
> nodes, which can still listen to all 500 events even when inserting data
> using multiple threads. May I ask if this issue only occurs at a single
> node? Are there any good solutions?
>

Re: Large data transfers with Ignite or Kafka?

2023-03-24 Thread Jeremy McMillan

That's a big question, and it isn't clear whether there's a large or small
ratio of reads to writes between these microservices, for example. It isn't
clear what your latency tolerance is for these large transfers either.

This sounds like a big endeavor, and if there's money to be made, your best
bet is to get architecture advice with an NDA so that the architecture can
take all of the cost/risk/benefit factors into consideration. Asking free
software community to give you blind architecture advice will not get you
much closer to a decision than you already are.

If it's not constrained by deadlines and r&d budget, maybe your best bet is
to try a couple of things out and compare what you can squeeze out of each?
Maybe you want to experiment before you choose an architectural design? We
would still need to understand more of your performance goals to help
design an experiment.

It's hard to be the first or only person with a good idea. The difference
between what you're imagining and what you can see already extant in the
world might be constrained by execution and not imagination. Please
consider sharing more info. FWIW, that's the other side of the old "bike
shed" parable if you are seeking input from others.

On Fri, Mar 24, 2023, 06:55 Thomas Kramer  wrote:

> Hi all,
>
> do you have any feedback on this? Or is this rather a question for
> StackOverflow?
>
> Thanks.
>
>
> On 21.03.23 10:00, Thomas Kramer wrote:
> > I'm considering Ignite for a ondemand-scalable microservice-oriented
> > architecture. I'd use the memory cache for shared data across the
> > microservices. Maybe I would also use Ignite compute for distributed
> > tasks, however I believe the MOA philosophy would recommend REST for
> > this.
> >
> > My question is rather about large data transfer between the
> > microservices. In addition to smaller amount of data shared in the
> > caches across all microservices, I need to constantly send large data
> > blocks (50M-500M) between the microservices, typically from one sender
> > to one receiver. There is no need to persist these on disk.
> >
> > Would Ignite be fast and efficient for this? What size of chunks should
> > the data be split? Or would I better use Kafka in parallel to Ignite to
> > transfer the large data blocks? Or maybe go even more low-level with
> > something like ZeroMQ?
> >
> > Thanks for comments and suggestions.
> >
> >
>

Re: Ignite Cluster issues with larger latency between nodes

2023-03-09 Thread Jeremy McMillan

Has this kind of benchmark ever been published for any p2p cluster
technology?

What questions would it answer if there were such benchmarks for Ignite?

Maybe this will help:

There is an established algorithm for estimating the amount of buffer space
necessary to keep a pipeline from stuttering during congestion. A
generation ago this was a big deal because most Linux distros shipped with
TCP buffer configuration that was insufficient for the rapidly improving
network performance of Ethernet and broadband Internet service. The same
idea generalizes for any streaming network communication, not only TCP.

https://en.m.wikipedia.org/wiki/Bandwidth-delay_product

Your infrastructure provider should be able to provide you with optimistic
bandwidth numbers. Decide how much latency you need to tolerate. For best
results, collect ping statistics over a long time to get realistic latency
expectations. Plug that into the formula.

To prevent buffer underruns and overruns, allocate buffer space for double
the BDP, as a rule of thumb. For best results, instrument the buffers and
collect statistics under various load scenarios and adjust as necessary.

This will only solve sporadic latency hiccups. Some of this traffic will
affect lock contention, so dealing with poor network performance isn't just
a buffering issue. Expect to find, investigate, and solve new issues after
you get rid of the buffering exceptions.

Good luck, and please let us know how things work for you.

On Thu, Mar 9, 2023, 17:08 Vicky  wrote:

> Thanks, Sumit. I've gone through these, but I don't see any mention of
> latency between two boxes within a cluster. Has any
> cloud-based benchmarking been done? More specifically when a single cluster
> is spread across multiple AZ's within the same region.
>
> On Wed, Mar 8, 2023 at 10:33 PM Sumit Deshinge 
> wrote:
>
>> Please check if these benchmark documents can help you :
>> 1. Apache Ignite and Apache Cassandra benchmarks
>> 
>> 2. Gridgain benchmark results
>> 
>>
>> You can also go through performance tips available on the official site
>> at:
>>
>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>
>> On Wed, Mar 8, 2023 at 3:51 AM Vicky  wrote:
>>
>>> Hi,
>>> Is there any benchmarking about what is an acceptable latency between
>>> nodes for an Ignite cluster to function stably?
>>>
>>> We are currently having a single cluster across AZ's (same region). The
>>> AZ latency published by the cloud provider is ~0.4-1ms.
>>>
>>> What we have observed is for boxes where the AZ latency is larger i.e. >
>>> 0.8, we start seeing server engine memory growing exponentially. We
>>> controlled that by setting the msg queue and slow client limits to 1024 &
>>> 1023 respectively. This helped get the memory in check.
>>>
>>> However now we are seeing client nodes failing with "Client node
>>> outbound message queue size exceeded slowClientQueueLimit, the client will
>>> be dropped (consider changing 'slowClientQueueLimit' configuration
>>> property)".
>>>
>>> This results in continuous disconnect and reconnect happening on these
>>> client nodes and subsequently no processing going through.
>>>
>>> Is there any benchmarking done for Ignite or documents available which
>>> say, for a stable ignite cluster the latency between nodes cannot be > x ms?
>>>
>>> However, if this is indeed our application issue then I would like to
>>> understand how to troubleshoot or get around this issue.
>>>
>>> Thanks
>>> Victor
>>>
>>
>>
>> --
>> Regards,
>> Sumit Deshinge
>>
>>

Re: Unable create a Cache SQL table on Ignite Node.

2023-03-01 Thread Jeremy McMillan

Java exception troubleshooting usually begins with an error message and a
stack trace. Can we get that added to your fine description of how you
found the error? We still don't know what error you found.

Also please provide your config, with secrets redacted, of course. Both the
details of the error and exceptions in the log are almost always necessary
information.

On Wed, Mar 1, 2023, 22:09 Abhishek Ubhe  wrote:

> Hello,
>
> Please check the below case and help with your suggestions.
>
> Case :
>
>- I have started Ignite node on the kubernetes pod.
>- Also load some caches there after starting that ignite server node
>in the same code flow. You can check the attachment for reference.
>- Now I want to create a cache SQL table on that K8s pod where I have
>started ignite server nodes.
>- But when I try to create it through a script on K8s job I get a null
>pointer for ignite instance.
>- You can verify my java API below.
>
> CacheConfiguration userURMCacheConfig =
> (CacheConfiguration) CommonCacheConfiguration
> .getCommonCacheConfig("USER_URM_CACHE");
>
> userURMCacheConfig.setIndexedTypes(String.class,
> UserURMCacheSQLTable.class);
> userURMCacheConfig.setCacheMode(CacheMode.REPLICATED);
> userURMCacheConfig.setBackups(2);
>
> IgniteCache userURMCache =
> *ignite.getOrCreateCache(userURMCacheConfig);*
>
> Note : Getting null pointer at bolded line in above code.
> --
> *Regards,*
> *Abhishek Ubhe*
>
>

Re: How to delete data of a specified partition with high performance

2023-03-01 Thread Jeremy McMillan

These documentation pages should help.

https://ignite.apache.org/docs/latest/key-value-api/basic-cache-operations

https://ignite.apache.org/docs/latest/configuring-caches/atomicity-modes

On Tue, Feb 28, 2023 at 11:42 PM 38797715 <38797...@qq.com> wrote:

> hi,
>
> How to delete data of a specified partition(or specified affinityKey)
> with high performance(multi thread)?
>
>
>

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread Jeremy McMillan

Have you tried tracing the workload on the 100% and 40% nodes for
comparison? There just isn't enough detail in your question to help predict
what should be happening with the cluster workload. For a starting point,
please identify your design goals. It's easy to get confused by advice that
seeks to help you do something you don't want to do.

Some things to think about include how the stream workload is composed. How
should/would this work if there were only one node? How should behavior
change as nodes are added to the topology and the test is repeated?

Gedanken: what if the data streamer is doing some really expensive
operations as it feeds the data into the stream, but the nodes can very
cheaply put the processed data into their cache partitions? In this case,
for example, the expensive operations should be refactored into a stream
transformer that will move the workload from the stream sender to the
stream receivers.
https://ignite.apache.org/docs/latest/data-streaming#stream-transformer

Also gedanken: what if the data distribution is skewed such that one node
gets more data than 2x the data sent to other partitions because of
affinity? In this case, for example, changes to affinity/colocation design
or changes to cluster topology (more nodes with greater CPU to RAM ratio?)
can help distribute the load so that no single node becomes a bottleneck.

On Tue, Feb 28, 2023 at 9:27 AM John Smith  wrote:

> Hi I'm using the data streamer to insert into a 3 cluster node. I have
> noticed that 1 node is pegging at 100% cpu while the others are at 40ish %.
>
> Is that normal?
>
>
>

Re: random scenario of insertion operations failed.

2023-02-16 Thread Jeremy McMillan

The first step to begin debugging would be to configure logging, and reduce
the Ignite pods to one. Increase the logging details until you see
consistent positive indication of Ignite doing INSERT for each operation.
If the failures do not appear, gradually increase the workload and then the
pods until you find the minimum configuration, topology, and workload to
consistently recreate these errors.

The details coming from that effort will allow others to help you debug
further. Be sure to provide all of the details of your minimal error
conditions so that others can repeat the same results.

For issues requiring multiple server nodes to participate in the error, you
probably want to consolidate logs so that one temporal record tells the
entire story. If you can, explain what others should look for to spot what
you're experiencing, and make it as easy as possible to see for the most
community support.

On Thu, Feb 16, 2023, 04:28 Abhishek Ubhe 
wrote:

> Hello,
>
> I am facing an issue while inserting data in ignite cache.
>
> Details : I have started ignite server nodes through kubernetes pod and
> using microservice architecture to process data internally. Some times
> randomly some requests for ignite insertions are not processed
> successfully. There is no specific error or delay logs to define this
> issue. No explanation found till now for me.
>
> I have checked below things :
>
>
>1. Checked all configurations for Ignite as well as *write behind
>database storage Apache Hbase*
>2. *Last log for before PUT in* cache is present.
>3. Still not inserted without any error & delays.
>4. I have also checked if other technology level issues are present on
>the server at that specific time.
>
>
> *Special Note *: Above issue is produced randomly for my server and also
> resolved automatically after some time. I am using* Ignite 2.10.0*
>
> --
> *Regards,*
> *Abhishek Ubhe*
>
>

Re: How to set -DIGNITE_QUIET=false in service.sh?

2023-02-01 Thread Jeremy McMillan

This seems to be at the level of a high quality bug report, and has enough
detail that a fix could probably be implemented and submitted as a PR
fairly easily. Are you familiar with the contributor process?

On Wed, Feb 1, 2023, 04:56 Айсина Роза Мунеровна 
wrote:

> Hola!
> We run Ignite via service from DEB package and use service.sh to start it.
>
> In the service file we set env *JVM_OPTS* like this:
>
> *Environment="JVM_OPTS=-server -Xms10g -Xmx10g … …  -DIGNITE_QUIET=false”*
>
> The problem is that *parseargs.sh* (look here
> )
> has default option *-DIGNITE_QUIET=true*,
> *which is not propagated in service.sh*:
>
> /usr/share/apache-ignite/bin/ignite.sh /etc/apache-ignite/$2 & echo $! >>
> /var/run/apache-ignite/$2.pid
>
> In service.sh we can pass only configuration so result options look like
> that:
>
> *[2023-02-01T09:42:43,105][INFO ][main][IgniteKernal] VM arguments: […,
>  -Xms10g, ..., -DIGNITE_QUIET=false, -Dfile.encoding=UTF-8,
> -DIGNITE_QUIET=true, -DIGNITE_SUCCESS_FILE=..., -DIGNITE_HOME=...,
> -DIGNITE_PROG_NAME=...]*
>
> Is there any way to set -DIGNITE_QUIET=false for service.sh without
> manually patching it?
> Maybe there is more priority option in configuration?
>
> Thanks!
>
> *--*
>
> *Роза Айсина*
>
> Старший разработчик ПО
>
> *СберМаркет* | Доставка из любимых магазинов
>
>
>
> Email: roza.ays...@sbermarket.ru
>
> Mob:
>
> Web: sbermarket.ru
>
> App: iOS
> 
> и Android
> 
>
>
>
> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
> документы, приложенные к нему, содержат конфиденциальную информацию.
> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
> Вам, использование, копирование, распространение информации, содержащейся в
> настоящем сообщении, а также осуществление любых действий на основе этой
> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
> сообщение.
> *CONFIDENTIALITY NOTICE:* This email and any files attached to it are
> confidential. If you are not the intended recipient you are notified that
> using, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited. If you have received
> this email in error please notify the sender and delete this email.
>

Re: Safe settings for ignite cache with external store

2023-01-27 Thread Jeremy McMillan

If your problem is simple and popular, then solutions will chase you.
Consider maybe you might be trying to do something really challenging that
doesn't have any off the shelf solutions.

You might need to approach this as a computer scientist and search for the
best fit, not just best available eviction strategy. I hope you find some
clues, and learn your way to success.

This may be of interest to you.

https://github.com/gridgain/gridgain-advanced-examples/blob/master/src/main/java/org/gridgain/examples/datagrid/eviction/CustomEvictionPolicyExample.java

On Fri, Jan 27, 2023, 18:36 Łukasz Dywicki  wrote:

> Hello again,
> I've spent another day on this issue looking at configuration and with
> cluster larger than configured backup count ignite did the work.
> After that I reverted to testing of single node with minimal
> configuration and come to the point that the only way to keep Ignite
> survive load was setting ExpiryPolicy to very low value (10s).
>
> To me it seems that Ignite behavior is to preserve all entries in memory
> at any cost, even if there is a risk of running into OOM. Is that true?
> I would like to change this behavior and make sure that entries do not
> expire because of time as long as there is memory available to store
> them. They should be evicted by appropriate EvictionPolicy only if
> memory fills up.
> To me it looks like DataStoreSettings do not make any impact in this
> regard. At least setting page eviction to LRU do not change it.
>
> Please let me know if I am doing something wrong as I can not prove
> Ignite to be working stable. Even with such basic objectives I outlined
> in earlier mail.
>
> Kind regards,
> Łukasz
>
> On 27.01.2023 00:58, Łukasz Dywicki wrote:
> > Dear all,
> > I come across use of Apache Ignite to cache results of expensive
> > computation operation.
> > Objectives are basic:
> > - Keep most of "hot" data in memory
> > - Offload cold part to cache store
> > - Keep memory utilization under control (evict entries as needed)
> > While it sounds basic, it doesn't seem to fit Ignite defaults.
> >
> > What I am testing now is behavior with large objects which can grow up
> > to 10 mb (serialized) or 25 mb (json representation). Usually objects
> > will stay far below that threshold, but we can't make assumption on that.
> > I began testing various configurations of Ignite in order to facilitate
> > offloading of memory contents to database. So far I am stuck for two
> > days at Ignite/application itself running out of memory after processing
> > several of such large objects. While I know that storing 10 mb blob in
> > database is not the best idea, I have to test that behavior too.
> >
> > By observing database contents I see that number of entries there grows,
> > but cache do not seem to be evicted. When I try to switch eviction, it
> > does require onheap to be switched on, and it still fails with LRU
> > eviction policy.
> >
> > So far I ended up with a named cache and default region configured as
> > below:
> > ```
> > IgniteConfiguration igniteConfiguration = new IgniteConfiguration();
> > igniteConfiguration.setDataStorageConfiguration(new
> > DataStorageConfiguration()
> >  .setDefaultDataRegionConfiguration(new DataRegionConfiguration()
> >  .setPersistenceEnabled(false)
> >  .setInitialSize(256L * 1024 * 1024)
> >  .setMaxSize(512L * 1024 * 1024)
> >  .setPageEvictionMode(DataPageEvictionMode.RANDOM_LRU)
> >  .setSwapPath(null)
> >  .setEvictionThreshold(0.75)
> >  )
> >  .setPageSize(DataStorageConfiguration.MAX_PAGE_SIZE)
> > );
> > CacheConfiguration expensiveCache = new
> > CacheConfiguration<>()
> >  .setName(CACHE_NAME)
> >  .setBackups(2)
> >  .setAtomicityMode(CacheAtomicityMode.ATOMIC)
> >  .setCacheStoreFactory(cacheJdbcBlobStoreFactory)
> >  .setWriteThrough(true)
> >  .setOnheapCacheEnabled(true)
> >  .setEvictionPolicyFactory(new LruEvictionPolicyFactory<>(1024))
> >  .setReadThrough(true);
> > igniteConfiguration.setCacheConfiguration(
> >  expensiveCache
> > );
> > ```
> >
> > What I observe is following - the cache keeps writing data into
> > database, but it does not remove old entries fast enough to prevent
> crash.
> > JVM parameters I use are fairly basic:
> > -Xms1g -Xmx1g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
> > -XX:+DisableExplicitGC
> >
> > The store mechanism is jdbc blob store. Exceptions I get happen to occur
> > in Ignite itself, processing (application code writing cache) or
> > communication thread used to feed cache. I collected one case here:
> > https://gist.github.com/splatch/b5ec9134cd9df19bc62f007dd17a19a1
> >
> > The error message in linked gist advice to enable persistence (which I
> > did via cache store!), increase memory limit (which I don't want to do),
> > or enable eviction/expiry policy (which somehow miss behave).
> > To me it looks like self defense mechanisms Ignite has are being t

Re: How to check that affinity key works?

2023-01-12 Thread Jeremy McMillan

Using distributed join will set a baseline for correct result set. If you
get different results without distributed join, then you know there's a
mismatch between join conditions and affinity. If you get the same results,
then data is distributed well for this join.

On Thu, Jan 12, 2023, 02:02 Айсина Роза Мунеровна 
wrote:

> Hi Jeremy!
>
> Thank you for your reply.
>
> Am I right that if JOIN is collocated then absence of affinity key will
> lead to incorrect results as data will not be fetched from other nodes?
>
> So the correct way to check influence of affinity key is to enable
> distributed JOIN?
>
> On 10 Jan 2023, at 10:48 PM, Jeremy McMillan 
> wrote:
>
> Внимание: Внешний отправитель!
> Если вы не знаете отправителя - не открывайте вложения, не переходите по
> ссылкам, не пересылайте письмо!
>
> If you are only doing colocated joins, then there will be no runtime
> overhead incurred by collecting distributed rows (colocated joins mean do
> not try to join data that might be distributed across nodes), so there
> might not be much difference in runtimes.
>
> The difference between different affinity keys, if any, will be seen in
> the results returned, and unless there's a significant difference in row
> count, it will be difficult to affect runtime performance using this
> strategy.
>
> On Tue, Jan 10, 2023, 13:32 Айсина Роза Мунеровна <
> roza.ays...@sbermarket.ru> wrote:
>
>> Hola!
>>
>> We want to optimize our SQL queries that make collocated JOINs on several
>> tables (about ~8 tables).
>>
>> Some tables have column “product_id” on which these tables are joined.
>> Business meaning is that the result are features for each product_id.
>>
>> So based on documentation we set “*product_id*” as affinity key
>> so that all data required for join will be located on the same node.
>> (Column “*product_id*” is always part of the primary key)
>>
>> But! After this we made experiments:
>> - put affinity key to other part of primary key (for example, if primary
>> key is "(product_id, store_id)", then affinity key is “store_id”);
>> - didn't specify affinity key at all.
>>
>> The problem is that all our load testing results didn’t changed!
>>
>> So the question - is there any way to make more advanced *EXPLAIN*,
>> that will show partition shuffling (if it happens) or data collocation?
>> Some debug tool for this problem. Like query plan in Spark.
>>
>>
>> Information about our setup:
>> - Ignite cluster on 5 VMs;
>> - all tables are partitioned or replicated;
>> - all tables are created with DDL SQL and all interactions are made *
>> only* through SQL API;
>> - DDL example:
>>
>> CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
>> (
>> product_id INT PRIMARY KEY,
>> total_cnt_orders_with_sku INT
>> )
>> WITH "CACHE_NAME=PUBLIC_ProductFeatures,
>> KEY_TYPE=io.sbmt.ProductFeaturesKey,
>> VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id,
>> TEMPLATE=PARTITIONED, BACKUPS=1
>>
>> - our main SQL query:
>>
>> SELECT
>> ProductFeatures.product_id,
>> ProductFeatures.total_cnt_orders_with_sku,
>> StoreProductFeatures.price,
>> UserProductFeaturesOrder.num_prev_orders_with_sku,
>> ...
>> FROM ProductFeatures
>> LEFT JOIN StoreProductFeatures
>>   ON ProductFeatures.product_id = StoreProductFeatures.product_id
>>   AND StoreProductFeatures.store_id = {store_id}
>> ... (more joins)
>> CROSS JOIN UserFeaturesDiscount
>> WHERE UserFeaturesDiscount.user_id = {user_id}
>>   AND ProductFeatures.product_id IN {skus}
>>   …
>>
>> Looking forward for some help.
>> *--*
>>
>> *Роза Айсина*
>> Старший разработчик ПО
>> *СберМаркет* | Доставка из любимых магазинов
>>
>>
>> Email: roza.ays...@sbermarket.ru
>> Mob:
>> Web: sbermarket.ru
>> App: iOS
>> <https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457>
>> и Android
>> <https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>
>>
>>
>>
>>
>>
>>
>> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
>> документы, приложенные к нему, содержат конфиденциальную информацию.
>> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
>> Вам, использование, копирование, распространение информации,

Re: How to check that affinity key works?

2023-01-10 Thread Jeremy McMillan

If you are only doing colocated joins, then there will be no runtime
overhead incurred by collecting distributed rows (colocated joins mean do
not try to join data that might be distributed across nodes), so there
might not be much difference in runtimes.

The difference between different affinity keys, if any, will be seen in the
results returned, and unless there's a significant difference in row count,
it will be difficult to affect runtime performance using this strategy.

On Tue, Jan 10, 2023, 13:32 Айсина Роза Мунеровна 
wrote:

> Hola!
>
> We want to optimize our SQL queries that make collocated JOINs on several
> tables (about ~8 tables).
>
> Some tables have column “product_id” on which these tables are joined.
> Business meaning is that the result are features for each product_id.
>
> So based on documentation we set “*product_id*” as affinity key
> so that all data required for join will be located on the same node.
> (Column “*product_id*” is always part of the primary key)
>
> But! After this we made experiments:
> - put affinity key to other part of primary key (for example, if primary
> key is "(product_id, store_id)", then affinity key is “store_id”);
> - didn't specify affinity key at all.
>
> The problem is that all our load testing results didn’t changed!
>
> So the question - is there any way to make more advanced *EXPLAIN*,
> that will show partition shuffling (if it happens) or data collocation?
> Some debug tool for this problem. Like query plan in Spark.
>
>
> Information about our setup:
> - Ignite cluster on 5 VMs;
> - all tables are partitioned or replicated;
> - all tables are created with DDL SQL and all interactions are made *only*
> through SQL API;
> - DDL example:
>
> CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
> (
> product_id INT PRIMARY KEY,
> total_cnt_orders_with_sku INT
> )
> WITH "CACHE_NAME=PUBLIC_ProductFeatures,
> KEY_TYPE=io.sbmt.ProductFeaturesKey,
> VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id,
> TEMPLATE=PARTITIONED, BACKUPS=1
>
> - our main SQL query:
>
> SELECT
> ProductFeatures.product_id,
> ProductFeatures.total_cnt_orders_with_sku,
> StoreProductFeatures.price,
> UserProductFeaturesOrder.num_prev_orders_with_sku,
> ...
> FROM ProductFeatures
> LEFT JOIN StoreProductFeatures
>   ON ProductFeatures.product_id = StoreProductFeatures.product_id
>   AND StoreProductFeatures.store_id = {store_id}
> ... (more joins)
> CROSS JOIN UserFeaturesDiscount
> WHERE UserFeaturesDiscount.user_id = {user_id}
>   AND ProductFeatures.product_id IN {skus}
>   …
>
> Looking forward for some help.
>
> *--*
>
> *Роза Айсина*
>
> Старший разработчик ПО
>
> *СберМаркет* | Доставка из любимых магазинов
>
>
>
> Email: roza.ays...@sbermarket.ru
>
> Mob:
>
> Web: sbermarket.ru
>
> App: iOS
> 
> и Android
> 
>
>
>
> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
> документы, приложенные к нему, содержат конфиденциальную информацию.
> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
> Вам, использование, копирование, распространение информации, содержащейся в
> настоящем сообщении, а также осуществление любых действий на основе этой
> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
> сообщение.
> *CONFIDENTIALITY NOTICE:* This email and any files attached to it are
> confidential. If you are not the intended recipient you are notified that
> using, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited. If you have received
> this email in error please notify the sender and delete this email.
>

Re: How can I specify a column of java object in my sql select list after switching to calcite

2022-11-10 Thread Jeremy McMillan

You might want to watch the recording of the summit talk on Ignite 3
changes.

There is a major change around how binary column types and objects are
stored using Calcite.

On Thu, Nov 10, 2022, 15:10 tore yang via user 
wrote:

> My cache has a column of type java map, the column name is "mymap". when I
> send a query "select id,mymap from mytable", it complains "Error: Failed to
> parse the query. (state=4200,code=1001). but when I run the query "select *
> from mytable", the column mymap returns perfectly. this wasn't a problem
> until I replaced "H2" with "Calcite", and it only fails for column with
> data type java.lang.objects.
>
> How to specify the column which is of type "java object".
>
> Thanks!
>

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-06 Thread Jeremy McMillan

Think of each AZ as being a massive piece of server hardware running VMs or
workloads for you. When hardware (or infrastructure maintenance process)
fails, assume everything on one AZ is lost at the same time.

On Sun, Nov 6, 2022, 09:58 Surinder Mehra  wrote:

> That's partially true. Whole excercise of configuring AZ as backup filter
> is because we want to handle AZ level failure.
>
> Anyway, thanks for inputs. Will figure out further steps
>
> On Sun, 6 Nov 2022, 20:55 Jeremy McMillan, 
> wrote:
>
>> Don't configure 2 backups when you only have two failure domains.
>>
>> You're worried about node level failure, but you're telling Ignite to
>> worry about AZ level failure.
>>
>>
>> On Sat, Nov 5, 2022, 21:57 Surinder Mehra  wrote:
>>
>>> Yeah I think there is a misunderstanding. Although I figured out my
>>> answers from our discussion, I will try one final attempt to clarify my
>>> point on 2X space for node3
>>>
>>> Node setup:
>>> Node1 and node 2 placed in AZ1
>>> Node 3 placed in AZ2
>>>
>>>  Since I am using AZ as backup filter as I mentioned in my first
>>> message. Back up if node 1 cannot be placed on node2 and back up of node 2
>>> cannot be placed on node1 as they are in same AZ. This simply means their
>>> backups would go to node3 which in another AZ. Hence node 3 space =(node3
>>> primary partitions+node 1 back up partitions+node2 backup partitions)
>>>
>>> Wouldn't this mean node 3 need 2X space as compared to node 1 and node2.
>>> Assuming backup partitions of node 3 would be equally distributed among
>>> other two nodes. They would need almost same space.
>>>
>>>
>>> On Tue, 1 Nov 2022, 23:30 Jeremy McMillan, 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Nov 1, 2022 at 10:02 AM Surinder Mehra 
>>>> wrote:
>>>>
>>>>> Even if we have 2 copies of data and primary and backup copy would be
>>>>> stored in different AZs. My question remains valid in this case as well.
>>>>>
>>>>
>>>> I think additional backup copies in the same AZ are superfluous if we
>>>> start with the assumption that multiple concurrent failures are most likely
>>>> to affect resources in the concurrent AZ. A second node failure, if that's
>>>> your failure budget, is likely to corrupt all the backup copies in the
>>>> second AZ.
>>>>
>>>> If you only have two AZs available in some data centers/deployments,
>>>> but you need 3-way redundancy on certain caches/tables, then using AZ node
>>>> attribute for backup filtering is too coarse grained. Using AZ is a general
>>>> case best practice which gives your cluster the best chance of surviving
>>>> multiple hardware failures in AWS because they pool hardware resources in
>>>> AZs. Maybe you just need three AZs? Maybe AZ isn't the correct failure
>>>> domain for your use case?
>>>>
>>>>
>>>>> Do we have to ensure nodes in two AZs are always present or does
>>>>> ignite have a way to indicate it couldn't create backups. Silently killing
>>>>> backups is not desirable state.
>>>>>
>>>>
>>>> Do you use synchronous or asynchronous backups?
>>>>
>>>> https://ignite.apache.org/docs/2.11.1/configuring-caches/configuring-backups#synchronous-and-asynchronous-backups
>>>>
>>>> You can periodically poll caches' configurations or hook a cluster
>>>> state event, and re-compare the cache backup configuration against the
>>>> enumerated available AZs, and raise an exception or log a message or
>>>> whatever to detect the issue as soon as AZ count drops below minimum. This
>>>> way might also be good for fuzzy warning condition detection point for
>>>> proactive infrastructure operations. If you count all of the nodes in each
>>>> AZ, you can detect and track AZ load imbalances as the ratio between the
>>>> smallest AZ node count and the average AZ node count.
>>>>
>>>>
>>>>> 2. In my original message with 2 nodes(node1 and node2) in AZ1, and
>>>>> 3rdnode in second AZ, backups of node1 and node2 would be placed one node 
>>>>> 3
>>>>> in AZ2. It would mean it need to have 2X space to store backups.
>>>>> Just trying to ensure my understanding is correct.
>>>>>
>>

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-06 Thread Jeremy McMillan

Don't configure 2 backups when you only have two failure domains.

You're worried about node level failure, but you're telling Ignite to worry
about AZ level failure.


On Sat, Nov 5, 2022, 21:57 Surinder Mehra  wrote:

> Yeah I think there is a misunderstanding. Although I figured out my
> answers from our discussion, I will try one final attempt to clarify my
> point on 2X space for node3
>
> Node setup:
> Node1 and node 2 placed in AZ1
> Node 3 placed in AZ2
>
>  Since I am using AZ as backup filter as I mentioned in my first message.
> Back up if node 1 cannot be placed on node2 and back up of node 2 cannot be
> placed on node1 as they are in same AZ. This simply means their backups
> would go to node3 which in another AZ. Hence node 3 space =(node3 primary
> partitions+node 1 back up partitions+node2 backup partitions)
>
> Wouldn't this mean node 3 need 2X space as compared to node 1 and node2.
> Assuming backup partitions of node 3 would be equally distributed among
> other two nodes. They would need almost same space.
>
>
> On Tue, 1 Nov 2022, 23:30 Jeremy McMillan, 
> wrote:
>
>>
>>
>> On Tue, Nov 1, 2022 at 10:02 AM Surinder Mehra 
>> wrote:
>>
>>> Even if we have 2 copies of data and primary and backup copy would be
>>> stored in different AZs. My question remains valid in this case as well.
>>>
>>
>> I think additional backup copies in the same AZ are superfluous if we
>> start with the assumption that multiple concurrent failures are most likely
>> to affect resources in the concurrent AZ. A second node failure, if that's
>> your failure budget, is likely to corrupt all the backup copies in the
>> second AZ.
>>
>> If you only have two AZs available in some data centers/deployments, but
>> you need 3-way redundancy on certain caches/tables, then using AZ node
>> attribute for backup filtering is too coarse grained. Using AZ is a general
>> case best practice which gives your cluster the best chance of surviving
>> multiple hardware failures in AWS because they pool hardware resources in
>> AZs. Maybe you just need three AZs? Maybe AZ isn't the correct failure
>> domain for your use case?
>>
>>
>>> Do we have to ensure nodes in two AZs are always present or does ignite
>>> have a way to indicate it couldn't create backups. Silently killing backups
>>> is not desirable state.
>>>
>>
>> Do you use synchronous or asynchronous backups?
>>
>> https://ignite.apache.org/docs/2.11.1/configuring-caches/configuring-backups#synchronous-and-asynchronous-backups
>>
>> You can periodically poll caches' configurations or hook a cluster state
>> event, and re-compare the cache backup configuration against the enumerated
>> available AZs, and raise an exception or log a message or whatever to
>> detect the issue as soon as AZ count drops below minimum. This way might
>> also be good for fuzzy warning condition detection point for proactive
>> infrastructure operations. If you count all of the nodes in each AZ, you
>> can detect and track AZ load imbalances as the ratio between the smallest
>> AZ node count and the average AZ node count.
>>
>>
>>> 2. In my original message with 2 nodes(node1 and node2) in AZ1, and
>>> 3rdnode in second AZ, backups of node1 and node2 would be placed one node 3
>>> in AZ2. It would mean it need to have 2X space to store backups.
>>> Just trying to ensure my understanding is correct.
>>>
>>
>> If you have three nodes, you divide your total footprint by three to get
>> the minimum node capacity.
>>
>> If you have 2 backups, that is one primary copy plus two more backup
>> copies, so you multiply your total footprint by 3.
>>
>> If you multiply, say 32GB by three for redundancy, that would be 96GB
>> total space needed for the sum of all nodes' footprint.
>>
>> If you divide the 96GB storage commitment among three nodes, then each
>> node must have a minimum of 32GB. That's what we started with as a nominal
>> data footprint, so 1x not 2x. Node 1 will need to accommodate backups from
>> node 2 and node 3. Node 2 will need to accommodate backups from node 1 and
>> node 3. Each node has one primary and two backup partition copies for each
>> partition of each cache with two backups.
>>
>>
>>> Hope my queries are clear to you now
>>>
>>
>> I still don't understand your operational goals, so I feel like we may be
>> dancing around a misunderstanding.
>>
>>
>>> On Tue, 1 Nov 2022, 20:19 Su

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-01 Thread Jeremy McMillan

On Tue, Nov 1, 2022 at 10:02 AM Surinder Mehra  wrote:

> Even if we have 2 copies of data and primary and backup copy would be
> stored in different AZs. My question remains valid in this case as well.
>

I think additional backup copies in the same AZ are superfluous if we start
with the assumption that multiple concurrent failures are most likely to
affect resources in the concurrent AZ. A second node failure, if that's
your failure budget, is likely to corrupt all the backup copies in the
second AZ.

If you only have two AZs available in some data centers/deployments, but
you need 3-way redundancy on certain caches/tables, then using AZ node
attribute for backup filtering is too coarse grained. Using AZ is a general
case best practice which gives your cluster the best chance of surviving
multiple hardware failures in AWS because they pool hardware resources in
AZs. Maybe you just need three AZs? Maybe AZ isn't the correct failure
domain for your use case?

> Do we have to ensure nodes in two AZs are always present or does ignite
> have a way to indicate it couldn't create backups. Silently killing backups
> is not desirable state.
>

Do you use synchronous or asynchronous backups?
https://ignite.apache.org/docs/2.11.1/configuring-caches/configuring-backups#synchronous-and-asynchronous-backups

You can periodically poll caches' configurations or hook a cluster state
event, and re-compare the cache backup configuration against the enumerated
available AZs, and raise an exception or log a message or whatever to
detect the issue as soon as AZ count drops below minimum. This way might
also be good for fuzzy warning condition detection point for proactive
infrastructure operations. If you count all of the nodes in each AZ, you
can detect and track AZ load imbalances as the ratio between the smallest
AZ node count and the average AZ node count.

> 2. In my original message with 2 nodes(node1 and node2) in AZ1, and
> 3rdnode in second AZ, backups of node1 and node2 would be placed one node 3
> in AZ2. It would mean it need to have 2X space to store backups.
> Just trying to ensure my understanding is correct.
>

If you have three nodes, you divide your total footprint by three to get
the minimum node capacity.

If you have 2 backups, that is one primary copy plus two more backup
copies, so you multiply your total footprint by 3.

If you multiply, say 32GB by three for redundancy, that would be 96GB total
space needed for the sum of all nodes' footprint.

If you divide the 96GB storage commitment among three nodes, then each node
must have a minimum of 32GB. That's what we started with as a nominal data
footprint, so 1x not 2x. Node 1 will need to accommodate backups from node
2 and node 3. Node 2 will need to accommodate backups from node 1 and node
3. Each node has one primary and two backup partition copies for each
partition of each cache with two backups.

> Hope my queries are clear to you now
>

I still don't understand your operational goals, so I feel like we may be
dancing around a misunderstanding.

> On Tue, 1 Nov 2022, 20:19 Surinder Mehra,  wrote:
>
>> Thanks for your reply. Let me try to answer your 2 questions below.
>> 1. I understand that it sacrifices the backups incase it can't place
>> backups appropriately. Question is, is it possible to fail the deployment
>> rather than risking single copy of data present in cluster. If this only
>> copy goes down, we will have downtime as data won't be present in cluster.
>> We should rather throw error if enough hardware is not present than risking
>> data unavailability issue during business activity
>>
>> 2. Why we want 3 copies of data. It's a design choice. We want to ensure
>> even if 2 nodes go down, we still have 3rd present to serve the data.
>>
>> Hope I answered your question
>>
>> On Tue, 1 Nov 2022, 19:40 Jeremy McMillan, 
>> wrote:
>>
>>> This question is a design question.
>>>
>>> What kids of fault states do you expect to tolerate? What is your
>>> failure budget?
>>>
>>> Why are you trying to make more than 2 copies of the data distribute
>>> across only two failure domains?
>>>
>>> Also "fail fast" means discover your implementation defects faster than
>>> your release cycle, not how fast you can cause data loss.
>>>
>>> On Tue, Nov 1, 2022, 09:01 Surinder Mehra  wrote:
>>>
>>>> gentle reminder.
>>>> One additional question: We have observed that if available AZs are
>>>> less than backups count, ignite skips creating backups. Is this correct
>>>> understanding? If yes, how can we fail fast if backups can not be placed
>>>> due to AZ limitation?
>

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-01 Thread Jeremy McMillan

Can you tell two stories which start out all nodes in the intended cluster
configuration are down, one story resulting in a successful cluster
startup, but the other detecting an invalid configuration, and refusing to
start?

I can anticipate problems understanding what to do when the first node
attempts to start, but only has its own AZ represented in the topology. How
can this first node know whether future nodes will be able to fulfill the
condition backup_replicas + 1 >=  AZ_count? The general case, allowing
elastic deployment, requires individual Ignite nodes to work in a
best-effort capacity.

I would approach this from a DevOps perspective, and just validate the
deployment before starting up any infrastructure. Look at all of the
relevant config files which would be deployed. Enumerate a projection of
deployed nodes and their AZs. Compare this against the desired backup
filter configuration and fail before starting any Ignite nodes with a
deployment automation tool exception.

On Tue, Nov 1, 2022 at 9:49 AM Surinder Mehra  wrote:

> Thanks for your reply. Let me try to answer your 2 questions below.
> 1. I understand that it sacrifices the backups incase it can't place
> backups appropriately. Question is, is it possible to fail the deployment
> rather than risking single copy of data present in cluster. If this only
> copy goes down, we will have downtime as data won't be present in cluster.
> We should rather throw error if enough hardware is not present than risking
> data unavailability issue during business activity
>
> 2. Why we want 3 copies of data. It's a design choice. We want to ensure
> even if 2 nodes go down, we still have 3rd present to serve the data.
>
> Hope I answered your question
>
> On Tue, 1 Nov 2022, 19:40 Jeremy McMillan, 
> wrote:
>
>> This question is a design question.
>>
>> What kids of fault states do you expect to tolerate? What is your failure
>> budget?
>>
>> Why are you trying to make more than 2 copies of the data distribute
>> across only two failure domains?
>>
>> Also "fail fast" means discover your implementation defects faster than
>> your release cycle, not how fast you can cause data loss.
>>
>> On Tue, Nov 1, 2022, 09:01 Surinder Mehra  wrote:
>>
>>> gentle reminder.
>>> One additional question: We have observed that if available AZs are less
>>> than backups count, ignite skips creating backups. Is this correct
>>> understanding? If yes, how can we fail fast if backups can not be placed
>>> due to AZ limitation?
>>>
>>> On Mon, Oct 31, 2022 at 6:30 PM Surinder Mehra 
>>> wrote:
>>>
>>>> Hi,
>>>> As per link attached, to ensure primary and backup partitions are not
>>>> stored on same node, We used AWS AZ as backup filter and now I can see if I
>>>> start two ignite nodes on the same machine, primary partitions are evenly
>>>> distributed but backups are always zero which is expected.
>>>>
>>>>
>>>> https://www.gridgain.com/docs/latest/installation-guide/aws/multiple-availability-zone-aws
>>>>
>>>> My question is what would happen if AZ-1 has 2 machines and AZ-2 has 1
>>>> machine and ignite cluster has only 3 nodes, each machine having one ignite
>>>> node.
>>>>
>>>> Node1[AZ1] - keys 1-100
>>>> Node2[AZ1] -  keys 101-200
>>>> Node3[AZ2] - keys  201 -300
>>>>
>>>> In the above scenario, if the backup count is 2, how would back up
>>>> partitions be distributed.
>>>>
>>>> 1. Would it mean node3 will have 2 backup copies of primary partitions
>>>> of node 1 and 2 ?
>>>> 2. If we have a 4 node cluster with 2 nodes in each AZ, would backup
>>>> copies also be placed on different nodes(In other words, does the backup
>>>> filter also apply to how backup copies are placed on nodes) ?
>>>>
>>>>
>>>>

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-01 Thread Jeremy McMillan

This question is a design question.

What kids of fault states do you expect to tolerate? What is your failure
budget?

Why are you trying to make more than 2 copies of the data distribute across
only two failure domains?

Also "fail fast" means discover your implementation defects faster than
your release cycle, not how fast you can cause data loss.

On Tue, Nov 1, 2022, 09:01 Surinder Mehra  wrote:

> gentle reminder.
> One additional question: We have observed that if available AZs are less
> than backups count, ignite skips creating backups. Is this correct
> understanding? If yes, how can we fail fast if backups can not be placed
> due to AZ limitation?
>
> On Mon, Oct 31, 2022 at 6:30 PM Surinder Mehra  wrote:
>
>> Hi,
>> As per link attached, to ensure primary and backup partitions are not
>> stored on same node, We used AWS AZ as backup filter and now I can see if I
>> start two ignite nodes on the same machine, primary partitions are evenly
>> distributed but backups are always zero which is expected.
>>
>>
>> https://www.gridgain.com/docs/latest/installation-guide/aws/multiple-availability-zone-aws
>>
>> My question is what would happen if AZ-1 has 2 machines and AZ-2 has 1
>> machine and ignite cluster has only 3 nodes, each machine having one ignite
>> node.
>>
>> Node1[AZ1] - keys 1-100
>> Node2[AZ1] -  keys 101-200
>> Node3[AZ2] - keys  201 -300
>>
>> In the above scenario, if the backup count is 2, how would back up
>> partitions be distributed.
>>
>> 1. Would it mean node3 will have 2 backup copies of primary partitions of
>> node 1 and 2 ?
>> 2. If we have a 4 node cluster with 2 nodes in each AZ, would backup
>> copies also be placed on different nodes(In other words, does the backup
>> filter also apply to how backup copies are placed on nodes) ?
>>
>>
>>

Re: Backup filter in ignite [Multi AZ deployment]

2022-11-01 Thread Jeremy McMillan

Using the AWS tutorial will get you a backup filter using this
implementation: ClusterNodeAttributeAffinityBackupFilter

There is logic to prevent a cascade of backup data onto survivor nodes in
case of multiple concurrent failures if you read the documentation.

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
"This implementation will discard backups rather than place multiple on the
same set of nodes. This avoids trying to cram more data onto remaining
nodes when some have failed."

If you don't have enough nodes to support backups, this implementation will
sacrifice backups to keep the cluster operating.

It isn't clear if this question is a hypothetical understanding of features
and functionality, or whether you have a design problem you need to solve.

On Mon, Oct 31, 2022 at 8:01 AM Surinder Mehra  wrote:

> Hi,
> As per link attached, to ensure primary and backup partitions are not
> stored on same node, We used AWS AZ as backup filter and now I can see if I
> start two ignite nodes on the same machine, primary partitions are evenly
> distributed but backups are always zero which is expected.
>
>
> https://www.gridgain.com/docs/latest/installation-guide/aws/multiple-availability-zone-aws
>
> My question is what would happen if AZ-1 has 2 machines and AZ-2 has 1
> machine and ignite cluster has only 3 nodes, each machine having one ignite
> node.
>
> Node1[AZ1] - keys 1-100
> Node2[AZ1] -  keys 101-200
> Node3[AZ2] - keys  201 -300
>
> In the above scenario, if the backup count is 2, how would back up
> partitions be distributed.
>
> 1. Would it mean node3 will have 2 backup copies of primary partitions of
> node 1 and 2 ?
> 2. If we have a 4 node cluster with 2 nodes in each AZ, would backup
> copies also be placed on different nodes(In other words, does the backup
> filter also apply to how backup copies are placed on nodes) ?
>
>
>

Re: Creating local cache without cluster-wide lock

2022-09-30 Thread Jeremy McMillan

Is your linear regression library/algorithm map-reduce compatible?

Can you identify which rows/records should be in any particular linear
regression run using data which is present at ingestion (ergo available to
Ignite affinity routing)?


On Fri, Sep 30, 2022, 03:51 Thomas Kramer  wrote:

> Here's my setup:
>
> I have a distributed training cache that contains all data with all
> possible features. This cache is updated constantly with new real-world
> data to improve accuracy.
>
> When running Linear Regression on nodes, they will need a different subset
> from that full training cache every time again depending on the test case
> for needed regression.
>
> So before Linear Regression on each node, I create from the full cache a
> local copy by filtering a subset that fits the current test data.
> Originally I used CacheBasedDatasetBuilder but noticed it creates multiple
> temporary caches with the same characteristics as upstream full training
> cache, i.e. distributed on all nodes. That's unnecessary in my case because
> I need the subset only temporarily on the worker node.
>
>
>
> On 30.09.22 01:08, Jeremy McMillan wrote:
>
> I share Stephen's curiosity about the use case. The best compromises are
> sensitive to situation and outcomes.
>
> Are you trying to cull training data into training, tuning, and validation
> subsets?
>
> Maybe there's a colocation approach that would suffice.
>
> On Thu, Sep 29, 2022, 12:26 Thomas Kramer  wrote:
>
>> Right, I don't want to use CacheMode.LOCAL because it's deprecated. Thus
>> my question, what will be the alternative for a purely local cache in
>> memory that doesn't cause the cluster-wide lock and no map exchange event?
>>
>>
>> On 29.09.22 14:57, Николай Ижиков wrote:
>>
>> You may not want to use LOCAL caches because they are removed in master
>> [1] and will not exists in next release
>>
>> [1]
>> https://github.com/apache/ignite/commit/01a7d075a5f48016511f6a754538201f12aff4f7
>>
>>
>> 29 сент. 2022 г., в 15:55, Николай Ижиков 
>> написал(а):
>>
>> Because node local cache created on each server node.
>>
>> 29 сент. 2022 г., в 15:43, Kramer  написал(а):
>>
>> Coming back to my original question:
>>
>> CacheConfiguration cfg = new CacheConfiguration<>();
>> cfg.setCacheMode(CacheMode.REPLICATED);
>> cfg.setAffinity(new LocalAffinityFunction());
>>
>> Will the above code still create a cluster wide lock with partition map
>> exchange event even though the cache will be hosted on local node only?
>>
>>
>>
>> *Gesendet:* Dienstag, 27. September 2022 um 18:50 Uhr
>> *Von:* "Thomas Kramer" 
>> *An:* user@ignite.apache.org
>> *Betreff:* Re: Creating local cache without cluster-wide lock
>> I'm using CacheBasedDataset to filter a subset from a distributed cache
>> of all training data for Linear Regression. This seems to by default use
>> the AffinityFunction from the upstream cache to create a new temporary
>> cache with every preprocessing trainer and on every dataset update. This
>> causes a lot of additional traffic if happening on multiple nodes.
>>
>> So I was looking to create local caches for the filtered datasets.
>>
>>
>> On 27.09.22 18:30, Stephen Darlington wrote:
>> > What are you trying to do? The general solution is to create a
>> long-lived cache and have a run-number or similar as part of the key.
>> >
>> >> On 27 Sep 2022, at 15:36, Thomas Kramer  wrote:
>> >>
>> >> I understand creating a new cache dynamically requires a cluster-wide
>> >> lock with partition map exchange event to create the cache on all
>> nodes.
>> >> This is unnecessary traffic when only working with local caches.
>> >>
>> >> For local-only caches I assume this wouldn't happen. But
>> CacheMode.LOCAL
>> >> is deprecated.
>> >>
>> >> Is there a way to create a local cache without triggering unnecessary
>> >> map exchange events?
>> >>
>> >> Would this work or does it still create a short global lock on all
>> nodes
>> >> not only the local node?
>> >>
>> >> CacheConfiguration cfg = new
>> >> CacheConfiguration<>();
>> >> cfg.setCacheMode(CacheMode.REPLICATED);
>> >> cfg.setAffinity(new LocalAffinityFunction());
>> >>
>>
>>
>>
>>

Re: Creating local cache without cluster-wide lock

2022-09-29 Thread Jeremy McMillan

I share Stephen's curiosity about the use case. The best compromises are
sensitive to situation and outcomes.

Are you trying to cull training data into training, tuning, and validation
subsets?

Maybe there's a colocation approach that would suffice.

On Thu, Sep 29, 2022, 12:26 Thomas Kramer  wrote:

> Right, I don't want to use CacheMode.LOCAL because it's deprecated. Thus
> my question, what will be the alternative for a purely local cache in
> memory that doesn't cause the cluster-wide lock and no map exchange event?
>
>
> On 29.09.22 14:57, Николай Ижиков wrote:
>
> You may not want to use LOCAL caches because they are removed in master
> [1] and will not exists in next release
>
> [1]
> https://github.com/apache/ignite/commit/01a7d075a5f48016511f6a754538201f12aff4f7
>
>
> 29 сент. 2022 г., в 15:55, Николай Ижиков 
> написал(а):
>
> Because node local cache created on each server node.
>
> 29 сент. 2022 г., в 15:43, Kramer  написал(а):
>
> Coming back to my original question:
>
> CacheConfiguration cfg = new CacheConfiguration<>();
> cfg.setCacheMode(CacheMode.REPLICATED);
> cfg.setAffinity(new LocalAffinityFunction());
>
> Will the above code still create a cluster wide lock with partition map
> exchange event even though the cache will be hosted on local node only?
>
>
>
> *Gesendet:* Dienstag, 27. September 2022 um 18:50 Uhr
> *Von:* "Thomas Kramer" 
> *An:* user@ignite.apache.org
> *Betreff:* Re: Creating local cache without cluster-wide lock
> I'm using CacheBasedDataset to filter a subset from a distributed cache
> of all training data for Linear Regression. This seems to by default use
> the AffinityFunction from the upstream cache to create a new temporary
> cache with every preprocessing trainer and on every dataset update. This
> causes a lot of additional traffic if happening on multiple nodes.
>
> So I was looking to create local caches for the filtered datasets.
>
>
> On 27.09.22 18:30, Stephen Darlington wrote:
> > What are you trying to do? The general solution is to create a
> long-lived cache and have a run-number or similar as part of the key.
> >
> >> On 27 Sep 2022, at 15:36, Thomas Kramer  wrote:
> >>
> >> I understand creating a new cache dynamically requires a cluster-wide
> >> lock with partition map exchange event to create the cache on all nodes.
> >> This is unnecessary traffic when only working with local caches.
> >>
> >> For local-only caches I assume this wouldn't happen. But CacheMode.LOCAL
> >> is deprecated.
> >>
> >> Is there a way to create a local cache without triggering unnecessary
> >> map exchange events?
> >>
> >> Would this work or does it still create a short global lock on all nodes
> >> not only the local node?
> >>
> >> CacheConfiguration cfg = new
> >> CacheConfiguration<>();
> >> cfg.setCacheMode(CacheMode.REPLICATED);
> >> cfg.setAffinity(new LocalAffinityFunction());
> >>
>
>
>
>

Re: Apache Hudi + Apache Ignite

2022-09-15 Thread Jeremy McMillan

I just read this, about hudi, and I can't see a use case for putting hudi
behind an Ignite write-through cache.

https://www.xenonstack.com/insights/what-is-hudi

Hudi seems to be a write accelerator for Spark on HDFS, primarily.

What would the expected outcome be if we assume the magic integration was
present and working as you intend? What's the difference between that and
not using Ignite with Hudi?

On Wed, Sep 14, 2022, 22:50 Tecno Brain 
wrote:

> In particular I am looking if anyone has used Apache Ignite as a
> write-through cache to Hudi.
> Does that make sense?
>
> On Wed, Sep 14, 2022 at 10:50 PM Tecno Brain 
> wrote:
>
>> I was wondering if anybody has used Hudi + Ignite?
>> Any references to articles, conferences are greatly appreciated.
>>
>> Thanks
>>
>>
>>
>>

Re: Edge Computing Read Time Application

2022-09-08 Thread Jeremy McMillan

Maybe it would be easier to start from OpenCV or something like that,
figure out how to get the processing pipeline MVP for the simplest use case
working, and then use Ignite as a data integration hub to scale out the
architecture?

My guess is the Redis example would have followed a similar design process.

On Thu, Sep 8, 2022, 07:53 Vicky Kak  wrote:

> Hello Guys,
>
> Has anyone tried an application similar to the following with Apache
> Ignite?
> https://github.com/RedisGears/EdgeRealtimeVideoAnalytics
>
> There are multiple Redis specific pieces here
> 1) Redis Gear Module.
> 2) Redis AI module.
> 2) Redis
>
> I am new to the Apache Ignite so would like to know a high level
> feasibility view from the core team .Apache Ignite being pluggable so it
> would be possible to have the Gears and AI modules build as Plugin.
> Before I go deep into it I would like to know if it is build so that I can
> have a look at it for the initial reference.
>
> Thanks,
> Vicky
>
>
>

Re: distributed-computing error System.Runtime.Serialization.ISerializable

2022-08-29 Thread Jeremy McMillan

Have you followed all of the conventions in the .Net remote assembly
loading doc? Have you been able to follow the example given?

https://ignite.apache.org/docs/latest/net-specific/net-remote-assembly-loading

On Mon, Aug 29, 2022 at 9:08 AM Charlin S  wrote:

> Hi,
> I have started .Net node on linux and my POC application throwing
> exception but I have added ,PeerAssemblyLoadingMode =
> PeerAssemblyLoadingMode.CurrentAppDomain
>
> Inner Exception 1:
> IgniteException: Compute job has failed on remote node, examine
> InnerException for details.
>
> Inner Exception 2:
> IgniteException: Failed to deserialize the job
> [errType=BinaryObjectException, errMsg=No matching type found for object
> [typeId=-1603946807,
> typeName=ConsoleApp2.TestModelComputeFunc`1[[Common.Models.StaticCacheModels.TestModel]]].
> This usually indicates that assembly with specified type is not loaded on a
> node. When using Apache.Ignite.exe, make sure to load assemblies with
> -assembly parameter. Alternatively, set
> IgniteConfiguration.PeerAssemblyLoadingMode to CurrentAppDomain.]
>
> Inner Exception 3:
> JavaException: class org.apache.ignite.IgniteException: Failed to
> deserialize the job [errType=BinaryObjectException, errMsg=No matching type
> found for object [typeId=-1603946807,
> typeName=ConsoleApp2.TestModelComputeFunc`1[[Common.Models.StaticCacheModels.TestModel]]].
> This usually indicates that assembly with specified type is not loaded on a
> node. When using Apache.Ignite.exe, make sure to load assemblies with
> -assembly parameter. Alternatively, set
> IgniteConfiguration.PeerAssemblyLoadingMode to CurrentAppDomain.]
> at
> org.apache.ignite.internal.processors.platform.callback.PlatformCallbackUtils.inLongOutLong(Native
> Method)
> at
> org.apache.ignite.internal.processors.platform.callback.PlatformCallbackGateway.computeJobCreate(PlatformCallbackGateway.java:295)
> at
> org.apache.ignite.internal.processors.platform.compute.PlatformAbstractJob.createJob(PlatformAbstractJob.java:114)
> at
> org.apache.ignite.internal.processors.platform.compute.PlatformClosureJob.execute0(PlatformClosureJob.java:66)
> at
> org.apache.ignite.internal.processors.platform.compute.PlatformAbstractJob.execute(PlatformAbstractJob.java:80)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:620)
> at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7366)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:614)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:539)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
> at
> org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1417)
> at
> org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2199)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1909)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1530)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:243)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1423)
> at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
>
> Regards,
> Charlin
>
>
> On Fri, 26 Aug 2022 at 19:30, Charlin S  wrote:
>
>> Hi,
>> Thank you for updating me. I will try and keep you updated.
>>
>> Regards,
>> Charlin
>>
>>
>> On Fri, 26 Aug 2022 at 18:47, Pavel Tupitsyn 
>> wrote:
>>
>>> Instead of ignite.sh, please use the following file from the binary
>>> package to start Ignite nodes on those Linux machines:
>>> platforms/dotnet/bin/netcoreapp3.1/Apache.Ignite.Executable
>>>
>>>
>>>
>>> On Fri, Aug 26, 2022 at 4:03 PM Charlin S 
>>> wrote:
>>>
 Hi,
 Server Node 1 IP and Server Node 2 IP are Linux machines, where we have
 hosted our Ignite as server node through Binary release packages.
 My requirement is to do computing from a client application (c#).

 Regards,
 Charlin




 On Thu, 25 Aug 2022 at 18:55, Pavel Tupitsyn 
 wrote:

> You are starting the client .NET node correctly. It connects to a
> cluster.
> However, SERVER nodes in your cluster are not .NET nodes, they are
> probably Java-only nodes, and can not execute .NET computations.
>
> You should fix the server nodes which are at "Server Node 1 IP",
> "Server Node 2 IP". Start them with Apache.Ignite.exe (Windows),
> Apache.Ignite

Re: How to enable ignite compress capability

2022-08-03 Thread Jeremy McMillan

https://github.com/apache/ignite/blob/da8a6bb4756c998aa99494d395752be96d841ec8/modules/core/src/main/java/org/apache/ignite/internal/processors/compress/FileSystemUtils.java#L45

Is Windows file storage supported for use with compression?

It isn't clear whether the Java FileStore spi supports the block size
attribute.
https://docs.oracle.com/javase/tutorial/essential/io/fileAttr.html

This looks from my perspective that Windows FileStore support would need
some additional implementation.
https://github.com/apache/ignite/blob/da8a6bb4756c998aa99494d395752be96d841ec8/modules/compress/src/main/java/org/apache/ignite/internal/processors/compress/NativeFileSystemLinux.java



On Wed, Aug 3, 2022 at 10:09 AM Sumit Deshinge 
wrote:

> Hi,
>
> I am trying to compress data using the ignite compress feature. I have
> copied all the jars from ignite-compress module's lib directory into my
> project's classpath, but still I am facing below error:
> *class org.apache.ignite.IgniteCheckedException: Make sure that
> ignite-compress module is in classpath*
>
> Is there anything else that needs to be done in a custom app deployment?
>
> Below are some more stack traces:
> Aug 3, 2022 8:31:46 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: Failed to wait for checkpoint finish during cache stop.
> class org.apache.ignite.IgniteCheckedException: Compound exception for
> CountDownFuture.
> at
> org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
> at
> org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
> at
> org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:478)
> at
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:167)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> * Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
> detect storage block size on Windows Server 2019 10.0 amd64*
> at
> org.apache.ignite.internal.processors.cache.CacheCompressionManager.compressPage(CacheCompressionManager.java:98)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageReadWriteManagerImpl.write(PageReadWriteManagerImpl.java:101)
> at
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.write(FilePageStoreManager.java:634)
>
> --
> Regards,
> Sumit Deshinge
>
>

57 matches

Mail list logo