Re: CPU soft lockup

2020-11-12 Thread Pavel Vinokurov
Hi Devakumar,

Could you show the logs from server nodes.

Thanks,
Pavel

пт, 13 нояб. 2020 г. в 04:22, Devakumar J :

> Hi,
> We are running on 2.8.0 version with 3 server nodes and 2 client nodes. We
> notice CPU soft lockup issues and server node goes down with critical error
> detected.
>
> Do we have any document reference or checklist to investigate this issue?
>
> Thanks,
> Devakumar
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 

Regards

Pavel Vinokurov


Re: maxSize - per node or per cluster? - only 2k/sec in 11 node dedicated cluster

2020-11-12 Thread Pavel Vinokurov
Hi Devin,

MaxSize is set per node for the specified data region.
A few clarifying questions:
What cache configuration are you using? The performance could depend on
the type of a cache and the number of backups.
How many clients and threads are writing to the cluster? Because it is a
possible bottleneck on the client side.
Have you tried  IgniteDataStreamer[2]? It's recommended approach to load
the massive amount of data.

You could refer to persistence tuning documentation[1].

[1] https://ignite.apache.org/docs/latest/persistence/persistence-tuning
[2] https://ignite.apache.org/docs/latest/data-streaming#data-streaming
Thanks,
Pavel

пт, 13 нояб. 2020 г. в 07:26, Devin Bost :

> We're trying to figure out how to get more throughput from our Ignite
> cluster.
> We have 11 dedicated Ignite VM nodes, each with 32 GB of RAM. Yet, we're
> only writing at 2k/sec max, even when we parallelize the writes to Ignite.
> We're using native persistence, but it just seems way slower than expected.
>
> We have our cache maxSize set to 16 GB. Since the cluster has a total of
> 352 GB of RAM, should our maxSize be set per node, or per cluster? If it's
> per cluster, then I'm thinking we'd want to increase maxSize to at least
> 300 GB. Looking for feedback on this.
>
> We've already turned off swapping on the boxes.
>
> What else can we tune to increase throughput? It seems like we should be
> getting 100x the throughput at least, so I'm hoping something is just
> misconfigured.
>
> Devin G. Bost
>


-- 

Regards

Pavel Vinokurov


maxSize - per node or per cluster? - only 2k/sec in 11 node dedicated cluster

2020-11-12 Thread Devin Bost
We're trying to figure out how to get more throughput from our Ignite
cluster.
We have 11 dedicated Ignite VM nodes, each with 32 GB of RAM. Yet, we're
only writing at 2k/sec max, even when we parallelize the writes to Ignite.
We're using native persistence, but it just seems way slower than expected.

We have our cache maxSize set to 16 GB. Since the cluster has a total of
352 GB of RAM, should our maxSize be set per node, or per cluster? If it's
per cluster, then I'm thinking we'd want to increase maxSize to at least
300 GB. Looking for feedback on this.

We've already turned off swapping on the boxes.

What else can we tune to increase throughput? It seems like we should be
getting 100x the throughput at least, so I'm hoping something is just
misconfigured.

Devin G. Bost


Re: Live coding session next week

2020-11-12 Thread Ilya Kazakov
Hello Valentin!

Could you provide a link for a meeting, please?

-
Ilya Kazakov

пт, 13 нояб. 2020 г. в 07:07, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Igniters,
>
> On Tuesday next week (Nov 17), Denis Magda and I will conduct a live
> coding session, where we will implement a primitive Ignite-like distributed
> database from scratch. We will demonstrate major components required for
> such a system, show how they interact with each other and how they can be
> implemented in Java.
>
> Join if you would like to better understand how distributed systems work
> under the hood, or if you simply want to have some fun :)
>
> RSVP and all the details here: Tuesday, November 17, 2020
> 8:00 AM to 10:00 AM PST
>
> -Val
>


CPU soft lockup

2020-11-12 Thread Devakumar J
Hi,
We are running on 2.8.0 version with 3 server nodes and 2 client nodes. We
notice CPU soft lockup issues and server node goes down with critical error
detected.

Do we have any document reference or checklist to investigate this issue?

Thanks,
Devakumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: [2.9.0]Entryprocessor cannot be hot deployed properly via UriDeploymentSpi

2020-11-12 Thread 18624049226

Hi Ilya,

https://issues.apache.org/jira/browse/IGNITE-13679


在 2020/11/12 下午9:41, Ilya Kasnacheev 写道:

Hello!

I suggest filing a feature request ticket against Apache Ignite JIRA. 
Best if you provide some reproducer project.


https://issues.apache.org/jira/browse/IGNITE 



You can also try some hybrid approach, such as firing a compute task 
from the entry processor, that would be hot redeployed properly.


Regards,
--
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 15:46, 18624049226 <18624049...@163.com 
>:


Hi Ilya,

Updating the user version does not affect this issue.

Adjusting the deploymentMode parameter also has no effect on this
issue.

在 2020/11/12 下午7:39, Ilya Kasnacheev 写道:

Hello!

Did you try changing user version between deployments?


https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading#un-deployment-and-user-versions



Regards,
-- 
Ilya Kasnacheev



чт, 12 нояб. 2020 г. в 12:07, 18624049226 <18624049...@163.com
>:

Hi Ilya,

This issue exists in both versions 2.8 and 2.8.1.

在 2020/11/11 下午10:05, Ilya Kasnacheev 写道:

Hello!

Did that work under 2.8? Can you check

If it wasn't, then maybe it is not implemented in the first
place. If it is a regression, we could try to address that.

Regards.
-- 
Ilya Kasnacheev



ср, 11 нояб. 2020 г. в 05:55, 18624049226
<18624049...@163.com >:

Any further conclusions?

在 2020/11/6 上午11:00, 18624049226 写道:
> Hi community,
>
> Entryprocessor cannot be hot deployed properly via
> UriDeploymentSpi,the operation steps are as follows:
>
> 1.put jar in the specified folder of uriList;
>
> 2.Use example-deploy.xml,start two ignite nodes;
>
> 3.Use the DeployClient to deploy the service named
"deployService";
>
> 4.Execute the test through ThickClientTest, and the
result is correct;
>
> 5.Modify the code of DeployServiceImpl and
DeployEntryProcessor, for
> example, change "Hello" to "Hi", then repackage it and
put it into the
> specified folder of uriList;
>
> 6.Redeploy services by RedeployClient;
>
> 7.Execute the test again through ThickClientTest, and
the result is
> incorrect,we will find that if the Entryprocessor
accessed by the
> service is on another node, the Entryprocessor uses
the old version of
> the class definition.
>
>



Re: KafkaStreamer, how to manage (stop consumming, resume) on client disconnection

2020-11-12 Thread akorensh
Hi, 
  You can use disconnect events/exception, and then use KafkaStreamer.stop.
  
  see:
https://ignite.apache.org/docs/latest/clustering/connect-client-nodes#client-disconnectedreconnected-events

  https://ignite.apache.org/docs/latest/clustering/connect-client-nodes 
  Here look for: While a client is in a disconnected state and an attempt to
reconnect is in progress, the Ignite API throws a
IgniteClientDisconnectedException
   

  KafkaStreamer stop method:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/stream/kafka/KafkaStreamer.html#stop--
 
Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Nodes failed to join the cluster after restarting

2020-11-12 Thread Cong Guo
Hi,

I have a 3-node cluster with persistence enabled. All the three nodes are
in the baseline topology. The ignite version is 2.8.1.

When I restart the first node, it encounters an error and fails to join the
cluster. The error message is "Caused by: org.apache.
ignite.spi.IgniteSpiException: Attempting to join node with larger
distributed metastorage version id. The node is most likely in invalid
state and can't be joined." I try several times but get the same error.

Then I restart the second node, it encounters the same error. After I
restart the third node, the other two nodes can start successfully and join
the cluster. When I restart the nodes, I do not change the baseline
topology. I cannot reproduce this error now.

I find someone else has the same problem.
http://apache-ignite-users.70518.x6.nabble.com/Question-about-baseline-topology-and-cluster-activation-td34336.html

The answer is corruption in the metastorage. I do not see any issue of the
metastorage files. However, it is a small probability event to have files
on two different machines corrupted at the same time. Is it possible that
this is another bug like https://issues.apache.org/jira/browse/IGNITE-12850?

Do you have any document about how the version id is updated and read?
Could you please show me in the source code where the version id is read
when a node starts and where the version id is updated when a node stops?
Thank you!


Re: Ignite timeouts and trouble interpreting the logs

2020-11-12 Thread Stanislav Lukyanov
Hi,

This looks weird but with the right logs we should figure it out.

One thing that I don't like about these settings is the asymmetry of the 
server's and client's timeouts.
The server will use clientFailureDetectionTimeout=30s when talking to the 
client.
The client will use failureDetectiontimeout=120s when talking to the server.

In general, I recommend setting failureDetectionTimeout on the client to be 
equal to clientFailureDetectionTimeout non the server.
In your case it means adding clientFailureDetectionTimeout=12

Next, on the logs. Let's try adding more info to easier debug it the next time 
it occurs.

1. Let's make sure client logs are collected. For that easier make sure you 
configure logging on the client - see here 
https://ignite.apache.org/docs/latest/logging.

2. Add DEBUG logs for Discovery subsystem. E.g. for log4j 2 configured via XML 
add the following


3. If you see a client not being able to connect - try taking a thread dump. 
It'll help to understand what's happening on the client at the time.

With logs from the client and the server perhaps we'll be able to find out 
what's happening.

Stan

> On 31 Oct 2020, at 00:21, tschauenberg  wrote:
> 
> First some background.  Ignite 2.8.1 with a 3 node cluster, two webserver
> client nodes, and one batch processing client node that comes and goes.
> 
> The two webserver thick client nodes and the one batch processing thick
> client node have the following configuration values:
> * IgniteConfiguration.setNetworkTimeout(6)
> * IgniteConfiguration.setFailureDetectionTimeout(12)
> * TcpDiscoverySpi.setJoinTimeout(6)
> * TcpCommunicationSpi.setIdleConnectionTimeout(Long.MAX_VALUE)
> 
> The server nodes do not have any timeouts set and are currently using all
> defaults.  My understanding is that means they are using:
> * failureDetectionTimeout 1
> * clientFailureDetectionTimeout 3
> 
> Every so often the batch processing client node fails to connect to the
> cluster.  We try to connect the batch processing client node to a single
> node in the cluster using:
> TcpDiscoverySpi.setIpFinder(TcpDiscoveryVmIpFinder().setAddresses(single
> node ip)
> 
> I see the following stream of logs on the server node the client connects to
> and I am hoping you can shed some light into what timeout values I have set
> incorrectly what values I need to set instead.
> 
> In these logs I have obfuscated the client IP to 10.1.2.xxx and the server
> IP as 10.1.10.xxx
> 
> 
> On the server node that the client tries to connect to I see the following
> sequence of messages:
> 
> [20:21:28,092][INFO][exchange-worker-#42][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=4146, minorTopVer=0], force=false, evt=NODE_JOINED,
> node=1b91b2a5-05ac-4809-8a3d-c1c2efb6a3e3]
> 
> So the client joined the cluster almost at exactly the same time it tried to
> join which seems good so far.
> 
> Then I see
> [20:21:54,726][INFO][db-checkpoint-thread-#56][GridCacheDatabaseSharedManager]
> Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=6ms,
> checkpointLockWait=0ms, checkpointListenersExecuteTime=6ms,
> checkpointLockHoldTime=8ms, reason='timeout']
> [20:21:58,044][INFO][tcp-disco-sock-reader-[1b91b2a5 10.1.2.xxx:47585
> client]-#4176][TcpDiscoverySpi] Finished serving remote node connection
> [rmtAddr=/10.1.2.xxx:47585, rmtPort=47585
> 
> [20:21:58,045][WARNING][grid-timeout-worker-#23][TcpDiscoverySpi] Socket
> write has timed out (consider increasing
> 'IgniteConfiguration.failureDetectionTimeout' configuration property)
> [failureDetectionTimeout=1, rmtAddr=/10.1.2.xxx:47585, rmtPort=47585,
> sockTimeout=5000]
> 
> I don't understand this socket timeout line because that remote address is
> the client remote address so I don't know what it was doing here and this
> failureDetectionTimeout isn't the clientFailureDetectionTimeout which I
> don't get.
> 
> It then seems to connect just fine to the client discovery here
> 
> [20:22:10,170][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery accepted incoming connection [rmtAddr=/10.1.2.xxx, rmtPort=56921]
> [20:22:10,170][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP
> discovery spawning a new thread for connection [rmtAddr=/10.1.2.xxx,
> rmtPort=56921]
> [20:22:10,171][INFO][tcp-disco-sock-reader-[]-#4178][TcpDiscoverySpi]
> Started serving remote node connection [rmtAddr=/10.1.2.xxx:56921,
> rmtPort=56921]
> [20:22:10,175][INFO][tcp-disco-sock-reader-[1b91b2a5 10.1.2.xxx:56921
> client]-#4178][TcpDiscoverySpi] Initialized connection with remote client
> node [nodeId=1b91b2a5-05ac-4809-8a3d-c1c2efb6a3e3,
> rmtAddr=/10.1.2.xxx:56921]
> [20:22:27,870][INFO][tcp-disco-sock-reader-[1b91b2a5 10.1.2.xxx:56921
> client]-#4178][TcpDiscoverySpi] Finished serving remote node connection
> [rmtAddr=/10.1.2.xxx:56921, rmtPort=56921
> 
> The client hits its timeout at 20:22:28 which is the 60

Re: tcp-disco-msg-worker system-critical error

2020-11-12 Thread akorensh
Hi,
  This might be due a network error or a GC pause.
  Use this guide to collect GC logs and look for long gc pauses:
https://ignite.apache.org/docs/latest/perf-and-troubleshooting/troubleshooting#detailed-gc-logs

   
 [threadName=tcp-comm-worker,
*blockedFor=110s]*
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Connect timed out
(consider increasing 'failureDetectionTimeout' configuration property)

][tcp-comm-worker-#1%EDIFCustomerCC%][TcpDiscoverySpi] Finished node ping
[nodeId=1bd0d94b-0df0-42ab-a89a-e4320ccadbc3, res=false, time=16ms]

  Here the communication worker is blocked, meaning the network might be
responsible.
  Check that all nodes are able to reach each other within timeout threshold
limits.

  See :
https://ignite.apache.org/docs/latest/clustering/network-configuration#connection-timeouts
   and:
https://ignite.apache.org/docs/latest/clustering/network-configuration


Thanks, Alex





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Live coding session next week

2020-11-12 Thread Valentin Kulichenko
Igniters,

On Tuesday next week (Nov 17), Denis Magda and I will conduct a live coding
session, where we will implement a primitive Ignite-like distributed
database from scratch. We will demonstrate major components required for
such a system, show how they interact with each other and how they can be
implemented in Java.

Join if you would like to better understand how distributed systems work
under the hood, or if you simply want to have some fun :)

RSVP and all the details here: Tuesday, November 17, 2020
8:00 AM to 10:00 AM PST

-Val


Re: WAL and WAL Archive volume size recommendation

2020-11-12 Thread facundo.maldonado
Ok, will do that. 

It's not clear at least for me why.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


KafkaStreamer, how to manage (stop consumming, resume) on client disconnection

2020-11-12 Thread facundo.maldonado
Hi all, I'm having some problems dealing with the KafkaStreamer.

I have a deployment with a streamer (client node) that consumes records from
a Kafka topic, and a data node (cache storage).

If for any reason, the cache node crashes or simple restarts, the client
node gets disconnected, but the KafkaStreamer keeps pulling records and
tries to push to the cache.

Is there a recommended way to stop the kafkaStreamer on client disconnection
and resume once the connection is established again?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: IgniteC++ throughput

2020-11-12 Thread Gangaiah Gundeboina
Hi Lieuwe,

To understand this in detail, please give more details like how you are
calculating TPS and code snippet for each one where you are doing
put/get/cursor.hasNext(). 

Regards,
Gangaiah



-
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


.NET 5 and Ignite

2020-11-12 Thread Pavel Tupitsyn
Igniters,

Here is a short note on recently released .NET 5:
https://ptupitsyn.github.io/Ignite-on-NET-5/


IgniteC++ throughput

2020-11-12 Thread Lieuwe
I wonder if anyone can shed some light on the Apache Ignite performance I am
seeing.

I am running a single node & have a very simple CacheConfiguration
consisting of 4 fields.

The program is very much like the put-get-example code shipped with Ignite &
I am doing a few tests to see how fast (how many transactions per second) I
can read & write data to the cache.

1: Just incrementing the key and doing ignite::cache::Cache::Put(key,
dataObject) I can push 100K entries in the cache at about 12K TPS

2: Doing the same for ignite::cache::Cache::Get(key) yields 150K TPS

3: I then use a ignite::cache::query::SqlFieldsQuery &
ignite::cache::query::QueryFieldsCursor to do "SELECT A, B, C, D FROM
MyCache WHERE _key = ?"
Only doing cursor.isValid() && cursor.hasNext() yields 26K TPS

4: The last test I do is as above, but instead of the where clause being
'_key = ?' .. I change this to 'A=?'. In other words I use one of the fields
as a select criteria. I only get a shocking 20 TPS.

Having an index on field A makes no difference. The size of the cache does -
when I reduce that to a handful of entries that last rate will go up to
about 2K TPS.


My questions:
- There seems to be a big difference between Put & Get .. is that normal?
- There is also big difference between scenario 2 & 3 whilst they are
essentially doing the same thing .. why is SQL having so much overhead? And
example 3 doesn't even parse the columns out of the cursor whereas example 2
gives me all 4 columns for the key.
- And most importantly - why the shocking performance in scenario 4?

Thanks




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: partition-exchanger system-critical thread blocked

2020-11-12 Thread Gangaiah Gundeboina
HI Ilya Kasnacheev,

Below are log entries with  the tread name 'partition-exchanger'



Line 41311:
[2020-11-09T06:44:13,605][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
Line 41315:
[2020-11-09T06:44:13,606][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]]]
Line 41316: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]
Line 47325:
[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
Line 47329:
[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]]]
Line 47330: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]
#


Below is the stack trace for 'exchange-worker-#344' worker thread,

#
Line 41109: [2020-11-09T04:11:42,068][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2791, minorTopVer=0], force=false, evt=NODE_JOINED,
node=f11cfea2-5ece-4867-b276-e94fa8458f47]
Line 41116: [2020-11-09T04:11:52,060][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=5c548b34-defa-4fcd-9bc7-364a4fbec8da, customEvt=null,
allowMerge=true]
Line 41117: [2020-11-09T04:11:52,062][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2792,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
err=null]
Line 41118: [2020-11-09T04:11:52,066][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=5c548b34-defa-4fcd-9bc7-364a4fbec8da, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1, 2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0],
sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0:0, AA.BB.CC.DD/AA.BB.CC.DD:0],
discPort=0, order=2792, intOrder=1527, lastExchangeTime=1604875311250,
loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
durationFromInit=0]
Line 41119: [2020-11-09T04:11:52,067][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0], crd=false]
Line 41120: [2020-11-09T04:11:52,072][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5c548b34-defa-4fcd-9bc7-364a4fbec8da]
Line 41124: [2020-11-09T04:11:52,451][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvcc

Unixodbc and Apache Ignite on OSX compilation

2020-11-12 Thread Wolfgang Meyerle

Hi,

I'm currently struggling a little bit getting Ignite up and running with 
UnixOdbc on my OSX machine.


I'm compiling the platform cpp files atm with

cmake -DWITH_ODBC=ON -DWITH_THIN_CLIENT=ON -DWITH_TESTS=ON 
-DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/Users/user/ApacheIgnite ..



and getting:

Undefined symbols for architecture x86_64:
  "_SQLGetPrivateProfileString", referenced from:
  ignite::odbc::ReadDsnString(char const*, 
std::__1::basic_string, 
std::__1::allocator > const&, std::__1::basic_stringstd::__1::char_traits, std::__1::allocator > const&) in 
dsn_config.cpp.o

  "_SQLInstallerError", referenced from:
  ignite::odbc::ThrowLastSetupError() in dsn_config.cpp.o
  "_SQLWritePrivateProfileString", referenced from:
  ignite::odbc::WriteDsnString(char const*, char const*, char 
const*) in dsn_config.cpp.o

ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see 
invocation)

make[2]: *** [odbc/libignite-odbc.2.9.0.50002.dylib] Error 1
make[1]: *** [odbc/CMakeFiles/ignite-odbc.dir/all] Error 2
make: *** [all] Error 2

as an output.


Any ideas?

My rough guess is that the UnixOdbc interface must have changed. 
Currently I'm using version unixodbc 2.3.9 according to brew.


Apache Ignite is in version apache-ignite-2.9.0


Regards,


Wolfgang



Re: First two chapters of Ignite ML book

2020-11-12 Thread Kseniya Romanova
Thank you, Alexey! Such posts can really help those, who are just
considering using Ignite. That kind of contribution is as significant as
the code!

чт, 12 нояб. 2020 г. в 14:42, Alexey Zinoviev :

> Hi, dear community, today I published two first chapters of Ignite ML books
>
>1. Apache Ignite ML: origins and development
>
> 
>2. Apache Ignite ML: possible use cases, racing with Spark ML, plans
>for the future
>
> 
>
> If you have any questions or feedback please let me know.
>
> P.S. I ask you to do +50 claps for my new article, if you have a medium
> account, this will increase it in the search results and recommendations.
>


Re: Inserting date into ignite with spark jdbc

2020-11-12 Thread Vladimir Pligin
Hi,

It seems that a dataset internally uses built-in
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider instead
of org.apache.ignite.spark.impl.IgniteRelationProvider in case you force it
to use JDBC. The provider from spark obviously doesn't tolerate Ignite
custom properties. To be honest I'm sure how it should work, I'll need to
think about it.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: [2.9.0]Entryprocessor cannot be hot deployed properly via UriDeploymentSpi

2020-11-12 Thread Ilya Kasnacheev
Hello!

I suggest filing a feature request ticket against Apache Ignite JIRA. Best
if you provide some reproducer project.

https://issues.apache.org/jira/browse/IGNITE

You can also try some hybrid approach, such as firing a compute task from
the entry processor, that would be hot redeployed properly.

Regards,
-- 
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 15:46, 18624049226 <18624049...@163.com>:

> Hi Ilya,
>
> Updating the user version does not affect this issue.
>
> Adjusting the deploymentMode parameter also has no effect on this issue.
> 在 2020/11/12 下午7:39, Ilya Kasnacheev 写道:
>
> Hello!
>
> Did you try changing user version between deployments?
>
>
> https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading#un-deployment-and-user-versions
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 12 нояб. 2020 г. в 12:07, 18624049226 <18624049...@163.com>:
>
>> Hi Ilya,
>>
>> This issue exists in both versions 2.8 and 2.8.1.
>> 在 2020/11/11 下午10:05, Ilya Kasnacheev 写道:
>>
>> Hello!
>>
>> Did that work under 2.8? Can you check
>>
>> If it wasn't, then maybe it is not implemented in the first place. If it
>> is a regression, we could try to address that.
>>
>> Regards.
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 11 нояб. 2020 г. в 05:55, 18624049226 <18624049...@163.com>:
>>
>>> Any further conclusions?
>>>
>>> 在 2020/11/6 上午11:00, 18624049226 写道:
>>> > Hi community,
>>> >
>>> > Entryprocessor cannot be hot deployed properly via
>>> > UriDeploymentSpi,the operation steps are as follows:
>>> >
>>> > 1.put jar in the specified folder of uriList;
>>> >
>>> > 2.Use example-deploy.xml,start two ignite nodes;
>>> >
>>> > 3.Use the DeployClient to deploy the service named "deployService";
>>> >
>>> > 4.Execute the test through ThickClientTest, and the result is correct;
>>> >
>>> > 5.Modify the code of DeployServiceImpl and DeployEntryProcessor, for
>>> > example, change "Hello" to "Hi", then repackage it and put it into the
>>> > specified folder of uriList;
>>> >
>>> > 6.Redeploy services by RedeployClient;
>>> >
>>> > 7.Execute the test again through ThickClientTest, and the result is
>>> > incorrect,we will find that if the Entryprocessor accessed by the
>>> > service is on another node, the Entryprocessor uses the old version of
>>> > the class definition.
>>> >
>>> >
>>>
>>>


Re: [2.9.0]Entryprocessor cannot be hot deployed properly via UriDeploymentSpi

2020-11-12 Thread 18624049226

Hi Ilya,

Updating the user version does not affect this issue.

Adjusting the deploymentMode parameter also has no effect on this issue.

在 2020/11/12 下午7:39, Ilya Kasnacheev 写道:

Hello!

Did you try changing user version between deployments?

https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading#un-deployment-and-user-versions 



Regards,
--
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 12:07, 18624049226 <18624049...@163.com 
>:


Hi Ilya,

This issue exists in both versions 2.8 and 2.8.1.

在 2020/11/11 下午10:05, Ilya Kasnacheev 写道:

Hello!

Did that work under 2.8? Can you check

If it wasn't, then maybe it is not implemented in the first
place. If it is a regression, we could try to address that.

Regards.
-- 
Ilya Kasnacheev



ср, 11 нояб. 2020 г. в 05:55, 18624049226 <18624049...@163.com
>:

Any further conclusions?

在 2020/11/6 上午11:00, 18624049226 写道:
> Hi community,
>
> Entryprocessor cannot be hot deployed properly via
> UriDeploymentSpi,the operation steps are as follows:
>
> 1.put jar in the specified folder of uriList;
>
> 2.Use example-deploy.xml,start two ignite nodes;
>
> 3.Use the DeployClient to deploy the service named
"deployService";
>
> 4.Execute the test through ThickClientTest, and the result
is correct;
>
> 5.Modify the code of DeployServiceImpl and
DeployEntryProcessor, for
> example, change "Hello" to "Hi", then repackage it and put
it into the
> specified folder of uriList;
>
> 6.Redeploy services by RedeployClient;
>
> 7.Execute the test again through ThickClientTest, and the
result is
> incorrect,we will find that if the Entryprocessor accessed
by the
> service is on another node, the Entryprocessor uses the old
version of
> the class definition.
>
>



tcp-disco-msg-worker system-critical error

2020-11-12 Thread Gangaiah Gundeboina


Hi Igniters,

Sometimes below system-critical error printing in the production logs
whenever restart the clients.
[2020-11-09T02:31:24,733][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=tcp-comm-worker,
*blockedFor=110s]*

It's showing blocked for 110s, it's huge time. Not getting what's the cause
for this, cluster is responding. Below are entries with  the thread name,
cloud please help us.

###
Line 4630: [2020-11-07T20:21:01,852][WARN
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Connect timed out
(consider increasing 'failureDetectionTimeout' configuration property)
[addr=/127.0.0.1:47101, failureDetectionTimeout=6]
Line 4631: [2020-11-07T20:21:01,852][WARN
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Failed to connect
to a remote node (make sure that destination node is alive and operating
system firewall is disabled on local and remote hosts)
[addrs=[NVMBD1BKY270D00/10.137.53.63:47101, /127.0.0.1:47101,
0:0:0:0:0:0:0:1%lo:47101]]
Line 4632: [2020-11-07T20:21:01,853][INFO
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpDiscoverySpi] Pinging node:
1bd0d94b-0df0-42ab-a89a-e4320ccadbc3
Line 4633: [2020-11-07T20:21:01,861][INFO
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpDiscoverySpi] Finished node ping
[nodeId=1bd0d94b-0df0-42ab-a89a-e4320ccadbc3, res=false, time=16ms]
Line 10216: [2020-11-09T02:31:24,733][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=365, state=RUNNABLE,
blockCnt=694, waitCnt=3344]
Line 12472: Thread [name="tcp-comm-worker-#1%EDIFCustomerCC%", id=365,
state=RUNNABLE, blockCnt=694, waitCnt=3344]
Line 15518: [2020-11-09T02:31:41,105][WARN
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Connect timed out
(consider increasing 'failureDetectionTimeout' configuration property)
[addr=/10.40.0.101:47100, failureDetectionTimeout=6]
Line 15519: [2020-11-09T02:31:41,105][WARN
][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Failed to connect
to a remote node (make sure that destination node is alive and operating
system firewall is disabled on local and remote hosts)
[addrs=[/10.40.0.101:47100, /127.0.0.1:47100]]
Line 16080: [2020-11-09T02:34:31,048][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=365, state=RUNNABLE,
blockCnt=694, waitCnt=3345]
Line 18914: Thread [name="tcp-comm-worker-#1%EDIFCustomerCC%", id=365,
state=RUNNABLE, blockCnt=694, waitCnt=3345]


##


[2020-11-09T02:30:10,266][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2499, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5ac75627-96b7-4334-a42a-9e86e09dbc38]
[2020-11-09T02:30:20,692][INFO
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=f99bcb57-5dbf-4f34-adfa-0399e73365d4,
startPtr=FileWALPointer [idx=1279351, fileOff=26759725, len=49557],
checkpointLockWait=0ms, checkpointLockHoldTime=15ms,
walCpRecordFsyncDuration=1ms, pages=5569, reason='timeout']
[2020-11-09T02:30:20,821][INFO
][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=f99bcb57-5dbf-4f34-adfa-0399e73365d4, pages=5569,
markPos=FileWALPointer [idx=1279351, fileOff=26759725, len=49557],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms,
pagesWrite=79ms, fsync=49ms, total=151ms]
[2020-11-09T02:31:24,733][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=tcp-comm-worker,
blockedFor=110s]
[2020-11-09T02:31:24,733][WARN ][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Thread [name="tcp-comm-worker-#1%EDIFCustomerCC%", id=365, state=RUNNABLE,
blockCnt=694, waitCnt=3344]

[2020-11-09T02:31:24,734][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=tcp-comm-worker, igniteInstanceName=EDIFCustomerCC, finished=false,
heartbeatTs=1604869173878]]]
org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
igniteInstanceName=EDIFCustomerCC, finished=false,
heartbeatTs=1604869173878]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstanc

Re: Ignite 2.9 one way client to server communication

2020-11-12 Thread Ilya Kasnacheev
Hello!

This is correct, as long as you do not start any new caches, adding a thick
client should be PME-less.

Do you have logs from joining client and server node (coordinator, crd=true
if possible)?

Regards,
-- 
Ilya Kasnacheev


пн, 2 нояб. 2020 г. в 20:25, Hemambara :

> Thank you for response. If I understand correctly, so thick client
> connectivity time is not depending on # of server nodes, it all depends on
> PME length ? Can you please elaborate or point me to the resources where I
> can understand PME length ? I got few links on how PME works. But sorry I
> did not what is PME length. Also as per below reference, ignite thick
> client
> 2.8.0 does not trigger any PME. Is this right?
>
>
> https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


First two chapters of Ignite ML book

2020-11-12 Thread Alexey Zinoviev
Hi, dear community, today I published two first chapters of Ignite ML books

   1. Apache Ignite ML: origins and development
   

   2. Apache Ignite ML: possible use cases, racing with Spark ML, plans for
   the future
   


If you have any questions or feedback please let me know.

P.S. I ask you to do +50 claps for my new article, if you have a medium
account, this will increase it in the search results and recommendations.


Re: [2.9.0]Entryprocessor cannot be hot deployed properly via UriDeploymentSpi

2020-11-12 Thread Ilya Kasnacheev
Hello!

Did you try changing user version between deployments?

https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading#un-deployment-and-user-versions

Regards,
-- 
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 12:07, 18624049226 <18624049...@163.com>:

> Hi Ilya,
>
> This issue exists in both versions 2.8 and 2.8.1.
> 在 2020/11/11 下午10:05, Ilya Kasnacheev 写道:
>
> Hello!
>
> Did that work under 2.8? Can you check
>
> If it wasn't, then maybe it is not implemented in the first place. If it
> is a regression, we could try to address that.
>
> Regards.
> --
> Ilya Kasnacheev
>
>
> ср, 11 нояб. 2020 г. в 05:55, 18624049226 <18624049...@163.com>:
>
>> Any further conclusions?
>>
>> 在 2020/11/6 上午11:00, 18624049226 写道:
>> > Hi community,
>> >
>> > Entryprocessor cannot be hot deployed properly via
>> > UriDeploymentSpi,the operation steps are as follows:
>> >
>> > 1.put jar in the specified folder of uriList;
>> >
>> > 2.Use example-deploy.xml,start two ignite nodes;
>> >
>> > 3.Use the DeployClient to deploy the service named "deployService";
>> >
>> > 4.Execute the test through ThickClientTest, and the result is correct;
>> >
>> > 5.Modify the code of DeployServiceImpl and DeployEntryProcessor, for
>> > example, change "Hello" to "Hi", then repackage it and put it into the
>> > specified folder of uriList;
>> >
>> > 6.Redeploy services by RedeployClient;
>> >
>> > 7.Execute the test again through ThickClientTest, and the result is
>> > incorrect,we will find that if the Entryprocessor accessed by the
>> > service is on another node, the Entryprocessor uses the old version of
>> > the class definition.
>> >
>> >
>>
>>


Re: L2-cache slow/not working as intended

2020-11-12 Thread Ilya Kasnacheev
Hello!

Then it should survive restart while keeping cache content. I'm not an
expert in Hibernate caching but that I would expect.

Regards,
-- 
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 12:58, Bastien Durel :

> Le mardi 10 novembre 2020 à 17:39 +0300, Ilya Kasnacheev a écrit :
> > Hello!
> >
> > You can make it semi-persistent by changing the internal Ignite node
> > type inside Hibernate to client (property clientMode=true) and
> > starting a few stand-alone nodes (one per each VM?)
> >
> > This way, its client will just connect to the existing cluster
> > with data already there.
> >
> > You can also enable Ignite persistence, but I assume that's not what
> > you want.
>
> Hello.
>
> Ignite is already started in client mode before initializing hibernate,
> and connected to a few stand-alone servers.
>
> Regards,
>
> --
> Bastien Durel
> DATA
> Intégration des données de l'entreprise,
> Systèmes d'information décisionnels.
>
> bastien.du...@data.fr
> tel : +33 (0) 1 57 19 59 28
> fax : +33 (0) 1 57 19 59 73
> 45 avenue Carnot, 94230 CACHAN France
> www.data.fr
>
>
>


Re: L2-cache slow/not working as intended

2020-11-12 Thread Bastien Durel
Le mardi 10 novembre 2020 à 17:39 +0300, Ilya Kasnacheev a écrit :
> Hello!
> 
> You can make it semi-persistent by changing the internal Ignite node
> type inside Hibernate to client (property clientMode=true) and
> starting a few stand-alone nodes (one per each VM?)
> 
> This way, its client will just connect to the existing cluster
> with data already there.
> 
> You can also enable Ignite persistence, but I assume that's not what
> you want.

Hello.

Ignite is already started in client mode before initializing hibernate,
and connected to a few stand-alone servers.

Regards,

-- 
Bastien Durel
DATA
Intégration des données de l'entreprise,
Systèmes d'information décisionnels.

bastien.du...@data.fr
tel : +33 (0) 1 57 19 59 28
fax : +33 (0) 1 57 19 59 73
45 avenue Carnot, 94230 CACHAN France
www.data.fr




Re: Query on IgniteApplication running on java11

2020-11-12 Thread Ilya Kasnacheev
Hello!

There may still be issues, such as showing -100% CPU load.

It's better to have all the required JVM options.

Regards,
-- 
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 05:42, vbm :

> Hi,
>
> In my machine jdk 11 is installed and I am trying to write an Ignite
> application to run on this environment.
>
> In my pom.xml, I have added below maven-compiler-plugin configuration and
> compiled the code.
>
> 
> org.apache.maven.plugins
> maven-compiler-plugin
> 3.8.0
> 
> 
> --add-exports
> java.base/jdk.internal.misc=ALL-UNNAMED
> --add-exports
> java.base/sun.nio.ch=ALL-UNNAMED
> --add-exports
>
> java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
> --add-exports
>
> jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
> 
> 
> 
>
>
> Now my question is do I need to use  the same JVM_OPTs when I start the
> client application like below:
> java $JVM_OPTS -cp  
>
> Currently I am not using the JVM_OPTS and the application is *running fine*
> but in below link it is mentioned it needs to be set.
>
> https://ignite.apache.org/docs/latest/quick-start/java#running-ignite-with-java-11-or-later
>
>
> May I know, will there be any issue if I do not use the JVM_OPTS ?
>
>
> Regards,
> Vishwas
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: [2.9.0]Entryprocessor cannot be hot deployed properly via UriDeploymentSpi

2020-11-12 Thread 18624049226

Hi Ilya,

This issue exists in both versions 2.8 and 2.8.1.

在 2020/11/11 下午10:05, Ilya Kasnacheev 写道:

Hello!

Did that work under 2.8? Can you check

If it wasn't, then maybe it is not implemented in the first place. If 
it is a regression, we could try to address that.


Regards.
--
Ilya Kasnacheev


ср, 11 нояб. 2020 г. в 05:55, 18624049226 <18624049...@163.com 
>:


Any further conclusions?

在 2020/11/6 上午11:00, 18624049226 写道:
> Hi community,
>
> Entryprocessor cannot be hot deployed properly via
> UriDeploymentSpi,the operation steps are as follows:
>
> 1.put jar in the specified folder of uriList;
>
> 2.Use example-deploy.xml,start two ignite nodes;
>
> 3.Use the DeployClient to deploy the service named "deployService";
>
> 4.Execute the test through ThickClientTest, and the result is
correct;
>
> 5.Modify the code of DeployServiceImpl and DeployEntryProcessor,
for
> example, change "Hello" to "Hi", then repackage it and put it
into the
> specified folder of uriList;
>
> 6.Redeploy services by RedeployClient;
>
> 7.Execute the test again through ThickClientTest, and the result is
> incorrect,we will find that if the Entryprocessor accessed by the
> service is on another node, the Entryprocessor uses the old
version of
> the class definition.
>
>