Ignite 2.8.1: Database Closed error

2020-11-19 Thread Mahesh Renduchintala
Hi,


Any pointers on what the below error means.
There seems to be an out of memory on discoverySPI.  what can cause this?




^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=50071MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=50071MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=1, qSize=0]
[16:29:45,630][INFO][exchange-worker-#85][GridCachePartitionExchangeManager] 
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=2016, minorTopVer=0], force=true, evt=DISCOVERY>
[16:30:12,877][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too 
long JVM pause: 2323 milliseconds.
[16:30:12,877][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP discovery 
accepted incoming connection [rmtAddr=/192.168.10.137, rmtPort=39599]
[16:30:12,882][SEVERE][tcp-disco-client-message-worker-[9a0c020b 
192.168.1.9:61059]-#1560][TcpDiscoverySpi] Runtime error caught during grid 
runnable execution: GridWorker [name=tcp-disco-client-message>
java.lang.OutOfMemoryError: Java heap space
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068)
at 
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522)
at 
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7761)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7697)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61)
[16:30:12,883][INFO][tcp-disco-srvr-[:47500]-#3][TcpDiscoverySpi] TCP discovery 
spawning a new thread for connection [rmtAddr=/192.168.10.137, rmtPort=39599]
[16:30:12,883][SEVERE][query-#218652][GridMapQueryExecutor] Failed to execute 
local query.
class org.apache.ignite.IgniteCheckedException: Failed to execute SQL query. 
The database has been closed [90098-197]
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:874)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:955)
at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:412)
at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:241)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2186)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$17(IgniteH2Indexing.java:2139)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3386)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1847)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1472)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1367)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: The database has been closed 
[90098-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.engine.Database.checkPowerOff(Database.java:536)
at org.h2.command.Command.executeQuery(Command.java:228)





Re: WAL and WAL Archive volume size recommendation

2020-11-06 Thread Mahesh Renduchintala
Dennis

"The WAL archive is used to store WAL segments that may be needed to recover 
the node after a crash. The number of segments kept in the archive is such that 
the total size of all segments does not exceed the specified size of the WAL 
archive"

Given the above in documentation, if we disable WAL-Archive as mentioned in the 
docs, will we have trouble recovering the data in work folder on the node 
reboot?

regards
mahesh


Re: ReadFromBackup, Primary_SYNC, Backups

2020-11-05 Thread Mahesh Renduchintala
Hi,
Can you please give some feedback on the below?



From: Mahesh Renduchintala 
Sent: Tuesday, November 3, 2020 8:20 AM
To: user@ignite.apache.org 
Subject: ReadFromBackup, Primary_SYNC, Backups

Hi

I have a large SQL table (12 million records) in cacheMode partitioned.
This table is distributed over two server nodes


-1-

When running a large SELECT from a thick client node,  could data be fetched 
from the backup instead of primary partitions?

Below is the configuration.










/>

We are seeing some performance improvement, since we made regardFromBackup= 
False and a few other things.

-2-
in a two server system, if readFromBackup = False for a cache, and if one 
server fails,

would 2nd server stop serving client requests since some partition data is in 
the backup?


-3-
is it possible that readFromBackup=True and Primary_Sync mode for a cache could 
give inconsistent data for cacheMode Replicated in a 2 server node?


-4-
if I increase backup= 10, in a 2 server system, would it mean that there are 10 
backups.

I am guessing, ignite would keep a single back up on each server, not 5 and 5 
on each server.
a new backup is created for that cache for any new node joining the cluster.
is this right understanding?




Strong

regards
mahesh



New Node - Rebalancing

2020-11-03 Thread Mahesh Renduchintala
Hi,

As soon as we add a new server node into the cluster, rebalancing starts  
this is clear.
is there a way to know when the rebalancing successfully ends on the new server 
node?
Caches in the cluster are both replicated and partitioned.

regards
Mahesh




Re: Failed to Resolve NodeTopology - ignite 2.8.1

2020-11-03 Thread Mahesh Renduchintala
Yes. I have IGNITE_EXCHANGE_HISTORY_SIZE set to 10.

should I set IGNITE_EXCHANGE_HISTORY_SIZE = 0 ???

will file a bug shortly




From: Ilya Kasnacheev 
Sent: Tuesday, November 3, 2020 8:57 PM
To: user@ignite.apache.org 
Subject: Re: Failed to Resolve NodeTopology - ignite 2.8.1

Hello!

You seemed to have a very old transaction which tried to access a topology 
version which was not in the history. Do you have IGNITE_EXCHANGE_HISTORY_SIZE 
set?

I think this is a bug that it causes node failure. I would expect that 
transaction is killed, that's all. Can you please file a ticket about this 
issue against Apache Ignite JIRA?

https://issues.apache.org/jira/projects/IGNITE

Regards,
--
Ilya Kasnacheev


вт, 3 нояб. 2020 г. в 05:59, Mahesh Renduchintala 
mailto:mahesh.renduchint...@aline-consulting.com>>:
Hi,

We saw all ignite nodes crash today morning. Below are the error logs?
why would "Failed to Resolve Node Topology" occur?
what would cause this?
If there is a network disturbance, should I not get some sort of segmentation 
error?




[14:30:10,095][INFO][grid-timeout-worker-#43][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=1ec1e3f9, uptime=1 day, 03:54:03.143]
^-- H/N/C [hosts=11, nodes=11, CPUs=46]
^-- CPU [cur=0.93%, avg=7.03%, GC=0%]
^-- PageMemory [pages=11582153]
^-- Heap [used=25392MB, free=48.34%, comm=49152MB]
^-- Off-heap [used=45772MB, free=30.47%, comm=65736MB]
^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
^--   default region [used=45771MB, free=30.16%, comm=65536MB]
^--   metastoreMemPlc region [used=0MB, free=99.03%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=53690MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=53689MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=2, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
[14:30:10,976][SEVERE][sys-stripe-17-#18][GridCacheIoManager] Failed processing 
message [senderId=8f0d3c00-7b18-456c-9066-3852abca7254, 
msg=GridNearTxPrepareRequest 
[futId=196a4f48571-8d97cb24-fd81-45b2-a60f-60c26db22c90, miniId=1, 
topVer=AffinityTopologyVersion [topVer=158, minorTopVer=2], 
subjId=8f0d3c00-7b18-456c-9066-3852abca7254, taskNameHash=0, txLbl=null, 
flags=[firstClientReq][implicitSingle], super=GridDistributedTxPrepareRequest 
[threadId=2651, concurrency=OPTIMISTIC, isolation=READ_COMMITTED, 
writeVer=GridCacheVersion [topVer=215702708, order=1604374451397, 
nodeOrder=21], timeout=42, reads=null, writes=ArrayList [IgniteTxEntry 
[txKey=null, val=CacheObjectImpl [val=null, hasValBytes=true][op=CREATE, val=], 
prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], 
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, 
explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], 
filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, 
nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, 
partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, 
plc=2, txState=null, flags=onePhase|last, super=GridDistributedBaseMessage 
[ver=GridCacheVersion [topVer=215702708, order=1604374451397, nodeOrder=21], 
committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage 
[cacheId=0, super=GridCacheMessage [msgId=2767960, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=158, minorTopVer=2], 
err=null, skipPrepare=false]]
class org.apache.ignite.IgniteException: Failed to resolve nodes topology 
[cacheGrp=DataStructure_DisHashMap, topVer=AffinityTopologyVersion [topVer=158, 
minorTopVer=2], history=[AffinityTopologyVersion [topVer=218, minorTopVer=0], 
AffinityTopologyVersion [topVer=219, minorTopVer=0], AffinityTopologyVersion 
[topVer=220, minorTopVer=0], AffinityTopologyVersion [topVer=221, 
minorTopVer=0], AffinityTopologyVersion [topVer=222, minorTopVer=0], 
AffinityTopologyVersion [topVer=223, minorTopVer=0], AffinityTopologyVersion 
[topVer=224, minorTopVer=0], AffinityTopologyVersion [topVer=225, 
minorTopVer=0], AffinityTopologyVersion [topVer=226, minorTopVer=0], 
AffinityTopologyVersion [topVer=227, minorTopVer=0]], snap=Snapshot 
[topVer=AffinityTopologyVersion [topVer=227, minorTopVer=0]], 
locNode=TcpDiscoveryNode [id=1ec1e3f9-4c61-4316-9db3-5b35379570ab, 
consistentId=afc388a3-aa34-4553-9b62-1ea36657feb0, addrs=ArrayList 
[192.168.1.6], sockAddrs=HashSet 
[/192.168.1.6:47500<http://192.168.1.6:47500>], discPort=47500, order=2, 
intOrder=2, lastExchangeTime=1604224232725, loc=true, 
ver=2.8.1#20200521-sha1:86422096, isClient=false]]
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDisc

Failed to Resolve NodeTopology - ignite 2.8.1

2020-11-02 Thread Mahesh Renduchintala
Hi,

We saw all ignite nodes crash today morning. Below are the error logs?
why would "Failed to Resolve Node Topology" occur?
what would cause this?
If there is a network disturbance, should I not get some sort of segmentation 
error?




[14:30:10,095][INFO][grid-timeout-worker-#43][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=1ec1e3f9, uptime=1 day, 03:54:03.143]
^-- H/N/C [hosts=11, nodes=11, CPUs=46]
^-- CPU [cur=0.93%, avg=7.03%, GC=0%]
^-- PageMemory [pages=11582153]
^-- Heap [used=25392MB, free=48.34%, comm=49152MB]
^-- Off-heap [used=45772MB, free=30.47%, comm=65736MB]
^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
^--   default region [used=45771MB, free=30.16%, comm=65536MB]
^--   metastoreMemPlc region [used=0MB, free=99.03%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=53690MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=53689MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=2, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
[14:30:10,976][SEVERE][sys-stripe-17-#18][GridCacheIoManager] Failed processing 
message [senderId=8f0d3c00-7b18-456c-9066-3852abca7254, 
msg=GridNearTxPrepareRequest 
[futId=196a4f48571-8d97cb24-fd81-45b2-a60f-60c26db22c90, miniId=1, 
topVer=AffinityTopologyVersion [topVer=158, minorTopVer=2], 
subjId=8f0d3c00-7b18-456c-9066-3852abca7254, taskNameHash=0, txLbl=null, 
flags=[firstClientReq][implicitSingle], super=GridDistributedTxPrepareRequest 
[threadId=2651, concurrency=OPTIMISTIC, isolation=READ_COMMITTED, 
writeVer=GridCacheVersion [topVer=215702708, order=1604374451397, 
nodeOrder=21], timeout=42, reads=null, writes=ArrayList [IgniteTxEntry 
[txKey=null, val=CacheObjectImpl [val=null, hasValBytes=true][op=CREATE, val=], 
prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], 
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, 
explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], 
filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, 
nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, 
partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, 
plc=2, txState=null, flags=onePhase|last, super=GridDistributedBaseMessage 
[ver=GridCacheVersion [topVer=215702708, order=1604374451397, nodeOrder=21], 
committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage 
[cacheId=0, super=GridCacheMessage [msgId=2767960, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=158, minorTopVer=2], 
err=null, skipPrepare=false]]
class org.apache.ignite.IgniteException: Failed to resolve nodes topology 
[cacheGrp=DataStructure_DisHashMap, topVer=AffinityTopologyVersion [topVer=158, 
minorTopVer=2], history=[AffinityTopologyVersion [topVer=218, minorTopVer=0], 
AffinityTopologyVersion [topVer=219, minorTopVer=0], AffinityTopologyVersion 
[topVer=220, minorTopVer=0], AffinityTopologyVersion [topVer=221, 
minorTopVer=0], AffinityTopologyVersion [topVer=222, minorTopVer=0], 
AffinityTopologyVersion [topVer=223, minorTopVer=0], AffinityTopologyVersion 
[topVer=224, minorTopVer=0], AffinityTopologyVersion [topVer=225, 
minorTopVer=0], AffinityTopologyVersion [topVer=226, minorTopVer=0], 
AffinityTopologyVersion [topVer=227, minorTopVer=0]], snap=Snapshot 
[topVer=AffinityTopologyVersion [topVer=227, minorTopVer=0]], 
locNode=TcpDiscoveryNode [id=1ec1e3f9-4c61-4316-9db3-5b35379570ab, 
consistentId=afc388a3-aa34-4553-9b62-1ea36657feb0, addrs=ArrayList 
[192.168.1.6], sockAddrs=HashSet [/192.168.1.6:47500], discPort=47500, order=2, 
intOrder=2, lastExchangeTime=1604224232725, loc=true, 
ver=2.8.1#20200521-sha1:86422096, isClient=false]]
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:1999)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:1881)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.needRemap(IgniteTxHandler.java:744)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:458)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:374)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:176)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:161)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:123)
at 

ReadFromBackup, Primary_SYNC, Backups

2020-11-02 Thread Mahesh Renduchintala
Hi

I have a large SQL table (12 million records) in cacheMode partitioned.
This table is distributed over two server nodes


-1-

When running a large SELECT from a thick client node,  could data be fetched 
from the backup instead of primary partitions?

Below is the configuration.










/>

We are seeing some performance improvement, since we made regardFromBackup= 
False and a few other things.

-2-
in a two server system, if readFromBackup = False for a cache, and if one 
server fails,

would 2nd server stop serving client requests since some partition data is in 
the backup?


-3-
is it possible that readFromBackup=True and Primary_Sync mode for a cache could 
give inconsistent data for cacheMode Replicated in a 2 server node?


-4-
if I increase backup= 10, in a 2 server system, would it mean that there are 10 
backups.

I am guessing, ignite would keep a single back up on each server, not 5 and 5 
on each server.
a new backup is created for that cache for any new node joining the cluster.
is this right understanding?




Strong

regards
mahesh



2.9.0 migration - Thick Client error

2020-10-23 Thread Mahesh Renduchintala
Hi,

Migrating from 2.8.1 to 2.9.0. We are seeing the below error with thick client
Please suggest how to work around it

regards
mahesh



WARNING: Failed to resolve default logging config file: 
config/java.util.logging.properties
[12:33:47]__  
[12:33:47]   /  _/ ___/ |/ /  _/_  __/ __/
[12:33:47]  _/ // (7 7// /  / / / _/
[12:33:47] /___/\___/_/|_/___/ /_/ /___/
[12:33:47]
[12:33:47] ver. 2.9.0#20201015-sha1:70742da8
[12:33:47] 2020 Copyright(C) Apache Software Foundation
[12:33:47]
[12:33:47] Ignite documentation: http://ignite.apache.org
[12:33:47]
[12:33:47] Quiet mode.
[12:33:47]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[12:33:47]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
"-v" to ignite.{sh|bat}
[12:33:47]
[12:33:47] OS: Linux 3.10.0-1127.19.1.el7.x86_64 amd64
[12:33:47] VM information: OpenJDK Runtime Environment 1.8.0_262-b10 Oracle 
Corporation OpenJDK 64-Bit Server VM 25.262-b10
[12:33:47] Configured plugins:
[12:33:47]   ^-- ml-inference-plugin 1.0.0
[12:33:47]   ^-- null
[12:33:47]
[12:33:47] Configured failure handler: 
[hnd=net.aline.cloudedh.base.database.IgniteDatabase$$Lambda$2/792791759@15975490]
[12:33:47] Security status [authentication=off, sandbox=off, tls/ssl=off]
Oct 23, 2020 12:33:47 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Exception during start processors, node will be stopped and close 
connections
java.lang.NoSuchMethodError: 
org.apache.ignite.internal.managers.systemview.GridSystemViewManager.registerWalker(Ljava/lang/Class;Lorg/apache/ignite/spi/systemview/view/SystemViewRowAttributeWalker;)V
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.start(IgniteH2Indexing.java:2103)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.start(GridQueryProcessor.java:276)
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1953)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1235)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:634)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:560)
at org.apache.ignite.Ignition.start(Ignition.java:328)
at 
net.aline.cloudedh.base.database.IgniteDatabase.thickClientConnect(IgniteDatabase.java:275)
at 
net.aline.cloudedh.base.database.IgniteDatabase.connect(IgniteDatabase.java:139)
at net.aline.cloudedh.base.framework.L1DSA.start(L1DSA.java:15)
at 
net.aline.cloudedh.base.framework.DataLoadTaskTest.main(DataLoadTaskTest.java:18)




Ignite Thin Client - IDLE TIMEOUT

2020-09-21 Thread Mahesh Renduchintala
Hi,

is there a way one can set IdleTImeOut for a thin client?

I did not see such configuration in ClientConfiguration.

regards
Mahesh





Ignite Thin Client - Compute Jobs

2020-09-21 Thread Mahesh Renduchintala
Hi,

WIth a thin client handle,  is it possible to launch tasks on the compute grid?

regards
mahesh



Re: Ignite 2.81. - NULL pointer exception

2020-09-02 Thread Mahesh Renduchintala
I sent the logs again. There is no specific activity.
We have a cluster - 2 servers and about 15 thick clients
Just happened without much info. I can say it is likely a new node joined in 
and it may have triggered this crash.




Re: Ignite 2.81. - NULL pointer exception

2020-09-02 Thread Mahesh Renduchintala
We received this null pointer exception again


05:57:14,810][INFO][exchange-worker-#81][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=1139, minorTopVer=0], crd=true]
[05:57:15,553][INFO][exchange-worker-#81][GridCachePartitionExchangeManager] 
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=1139, minorTopVer=0], force=false, evt=NODE_FAILED, 
node=002bbc15-ae2f-4afc-b8a9-0c90c9aa25d4]
[05:57:15,920][INFO][grid-timeout-worker-#43][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=457c7279, uptime=1 day, 11:55:12.022]
^-- H/N/C [hosts=14, nodes=15, CPUs=49]
^-- CPU [cur=7.9%, avg=2.39%, GC=0.37%]
^-- PageMemory [pages=10033374]
^-- Heap [used=18402MB, free=62.56%, comm=49152MB]
^-- Off-heap [used=39652MB, free=39.77%, comm=65736MB]
^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
^--   default region [used=39651MB, free=39.5%, comm=65536MB]
^--   metastoreMemPlc region [used=1MB, free=98.96%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=39192MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=39191MB]
^--   metastoreMemPlc region [used=1MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=2, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
[05:57:16,153][INFO][exchange-worker-#81][GridCachePartitionExchangeManager] 
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=1139, minorTopVer=0], force=true, evt=DISCOVERY_CUSTOM_EVT, 
node=aa6bfd45-5c9a-4c61-84d2-588ef0af3265]
[05:57:16,193][SEVERE][sys-stripe-15-#16][] Critical system error detected. 
Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=CRITICAL_ERROR, 
err=java.lang.NullPointerException]]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1064)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:953)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:909)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:123)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:217)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:215)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1847)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1472)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1367)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:745)
[05:57:16,495][WARNING][sys-stripe-15-#16][CacheDiagnosticManager] Page locks 
dump:

Thread=[name=auth-#69, id=119], state=WAITING
Locked pages = []
Locked pages log: name=auth-#69 time=(1599026236194, 2020-09-02 05:57:16.194)


Thread=[name=checkpoint-runner-#109, id=163], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#109 time=(1599026236194, 2020-09-02 
05:57:16.194)


Thread=[name=checkpoint-runner-#110, id=164], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#110 time=(1599026236194, 2020-09-02 
05:57:16.194)


Thread=[name=checkpoint-runner-#111, id=165], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#111 time=(1599026236194, 2020-09-02 
05:57:16.194)


Thread=[name=checkpoint-runner-#112, id=166], state=WAITING
Locked pages = []
Locked pages log: 

Ignite 2.81. - NULL pointer exception

2020-08-05 Thread Mahesh Renduchintala
Hi,

we have a null pointer exception in one of our servers.  No major activity was 
happening when the server crashed

Please check the logs and see if there is any workaround we can use.
We are in a production environment.

regards
mahesh

[05:42:42,194][SEVERE][sys-stripe-19-#20][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.NullPointerException]]
java.lang.NullPointerException
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1064)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:953)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:909)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:123)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:217)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:215)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1847)
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1472)
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1367)
	at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.lang.Thread.run(Thread.java:745)
[05:42:42,216][INFO][grid-timeout-worker-#43][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=76a9edc3, uptime=14:26:42.572]
^-- H/N/C [hosts=13, nodes=13, CPUs=50]
^-- CPU [cur=4.37%, avg=2.77%, GC=0%]
^-- PageMemory [pages=7420567]
^-- Heap [used=19580MB, free=60.16%, comm=49152MB]
^-- Off-heap [used=29326MB, free=55.46%, comm=65736MB]
^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^--   default region [used=29326MB, free=55.25%, comm=65536MB]
^--   metastoreMemPlc region [used=0MB, free=99.75%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=42982MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=42981MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
[05:42:42,460][WARNING][sys-stripe-19-#20][CacheDiagnosticManager] Page locks dump:

Thread=[name=checkpoint-runner-#125, id=195], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#125 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=checkpoint-runner-#126, id=196], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#126 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=checkpoint-runner-#127, id=197], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#127 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=checkpoint-runner-#128, id=198], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#128 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=client-connector-#513, id=680], state=WAITING
Locked pages = []
Locked pages log: name=client-connector-#513 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=client-connector-#515, id=682], state=WAITING
Locked pages = []
Locked pages log: name=client-connector-#515 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=client-connector-#517, id=684], state=WAITING
Locked pages = []
Locked pages log: name=client-connector-#517 time=(1596606162197, 2020-08-05 05:42:42.197)


Thread=[name=client-connector-#518, 

ignite 2.8.1: Failed to resolve node topology

2020-07-09 Thread Mahesh Renduchintala
 Hi,

We have a crash in our environment with the below error.
Any insight into what might have gone wrong?

regards
Mahesh


^-- Heap [used=17211MB, free=64.98%, comm=49152MB]
^-- Off-heap [used=48448MB, free=26.41%, comm=65736MB]
^--   sysMemPlc region [used=0MB, free=99.98%, comm=100MB]
^--   default region [used=48447MB, free=26.07%, comm=65536MB]
^--   metastoreMemPlc region [used=0MB, free=99.21%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=48647MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=48646MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=4, qSize=0]
^-- System thread pool [active=11, idle=12, qSize=0]
[18:44:06,313][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=83, 
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0], 
err=null, rebalanced=true, wasRebalanced=true]
[18:44:06,316][WARNING][sys-stripe-21-#22][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=205532475, order=1594223109470, nodeOrder=51], dhtTxId=null, 
node=35c37b55-ec30-4457-b861-403bcfc20c12, commit=false]
[18:44:06,317][WARNING][sys-stripe-18-#19][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=205532475, order=1594223109273, nodeOrder=51], dhtTxId=null, 
node=35c37b55-ec30-4457-b861-403bcfc20c12, commit=false]
[18:44:06,317][WARNING][sys-stripe-23-#24][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=205532475, order=1594223109436, nodeOrder=51], dhtTxId=null, 
node=35c37b55-ec30-4457-b861-403bcfc20c12, commit=false]
[18:44:06,321][WARNING][sys-stripe-21-#22][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=205532475, order=1594223109502, nodeOrder=51], dhtTxId=null, 
node=35c37b55-ec30-4457-b861-403bcfc20c12, commit=false]
[18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture] 
Completed partition exchange [localNode=d38d1293-9dd5-4b4e-9934-97ba3fbafa62, 
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
[topVer=83, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
[id=c19d4735-2b52-487f-9d6d-0574b8f15858, 
consistentId=L1APISERVICE_7aa60a92371, addrs=ArrayList [10.244.0.93, 
127.0.0.1], sockAddrs=HashSet [/10.244.0.93:0, /127.0.0.1:0], discPort=0, 
order=68, intOrder=39, lastExchangeTime=1594131247817, loc=false, 
ver=2.8.1#20200521-sha1:86422096, isClient=true], done=true, newCrdFut=null], 
topVer=AffinityTopologyVersion [topVer=83, minorTopVer=0]]
[18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture] 
Exchange timings [startVer=AffinityTopologyVersion [topVer=83, minorTopVer=0], 
resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0], stage="Waiting in 
exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), 
stage="Determine exchange type" (16 ms), stage="Exchange done" (71053 ms), 
stage="Total time" (71069 ms)]
[18:44:06,327][INFO][exchange-worker-#70][GridDhtPartitionsExchangeFuture] 
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=83, 
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=83, minorTopVer=0]]
[18:44:06,327][INFO][exchange-worker-#70][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=83, minorTopVer=0], crd=false]
[18:44:06,345][INFO][db-checkpoint-thread-#112][GridCacheDatabaseSharedManager] 
Checkpoint started [checkpointId=92718d25-2db4-4691-886c-ec26b8b6ecba, 
startPtr=FileWALPointer [idx=389, fileOff=2151853, len=17162371], 
checkpointBeforeLockTime=869ms, checkpointLockWait=86163ms, 
checkpointListenersExecuteTime=784ms, checkpointLockHoldTime=859ms, 
walCpRecordFsyncDuration=20ms, writeCheckpointEntryDuration=1ms, 
splitAndSortCpPagesDuration=24ms,  pages=28028, reason='timeout']
[18:44:06,388][SEVERE][sys-stripe-3-#4][] Critical system error detected. Will 
be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to 
resolve nodes topology [cacheGrp=SQL_PUBLIC_MILLION, 
topVer=AffinityTopologyVersion [topVer=82, minorTopVer=0], 
history=[AffinityTopologyVersion [topVer=84, minorTopVer=0], 
AffinityTopologyVersion [topVer=85, minorTopVer=0], AffinityTopologyVersion 
[topVer=86, minorTopVer=0], AffinityTopologyVersion [topVer=87, minorTopVer=0], 
AffinityTopologyVersion [topVer=88, 

Atomic Sequence/Auto Increment via IgniteClient - Thin Client - Ignite 2.8.1

2020-06-17 Thread Mahesh Renduchintala
Hello,

Is there a way to access ignite Atomic Sequence via thin client apis?
We have a situation where we have thick clients (java micro-services) and thin 
clients (Spark) and both need to generate IDs for inserting data into tables.

regards
Mahesh





Re: Ignite 2.8.1: Spark 2.4.4. No class found error

2020-06-03 Thread Mahesh Renduchintala
this issue is solved. I was using 2.4.4 cluster and using 2.3 binaries



Re: Ignite Spark - Error - Did you call Ignition.start(..)?

2020-06-03 Thread Mahesh Renduchintala
I am seeing this problem even on Ignite 2.8.1, spark 2.4.4
Does spark-worker and ignite be on the same server?


val CONFIG = "resources/node-config-spark.xml"
val TABLE_NAME = "table_access_master"

//Creating Ignite-specific implementation of Spark session.

val igniteSession = IgniteSparkSession.builder()
  .appName("Spark test")
  .master("spark://192.168.1.25:7077")
  .config("spark.executor.instances", "1")
  .config("spark.cores.max", 2)
  .config("spark.submit.deployMode", "client")
  .config("spark.executor.memory", "1g")
  .config("spark.driver.cores", 2)
  .config("spark.executor.cores", 2)
  .config("spark.driver.memory", "2g")
  .config("spark.driver.extraClassPath", "/opt/ignite/libs/*" +

":/opt/ignite/libs/optional/ignite-spark/*:/opt/ignite/libs/optional/ignite-log4j/*"
 +
":/opt/ignite/libs/optional/ignite-yarn/*:/opt/ignite/libs/ignite-spring/" +
":/root/spark-2.4.4-bin-hadoop2.7/jars/*")
  .config("spark.executor.extraClassPath", "/opt/ignite/libs/*" +

":/opt/ignite/libs/optional/ignite-spark/*:/opt/ignite/libs/optional/ignite-log4j/*"
 +
":/opt/ignite/libs/optional/ignite-yarn/*:/opt/ignite/libs/ignite-spring/")

  .igniteConfig(CONFIG)
  .getOrCreate()

//Showing existing tables.
igniteSession.catalog.listTables().show()
igniteSession.catalog.listColumns(TABLE_NAME).show()

var df = igniteSession.sql("select * from table_access_master limit 3")
df.show(2)
igniteSession.close()

Ignition.stopAll(true)




Ignite 2.8.1: Spark 2.4.4. No class found error

2020-06-03 Thread Mahesh Renduchintala
Hi,

I am running a program that connects to a remote spark cluster.  I get the 
following error. Any insight into the error?

Spark 2.4.4
Scala 2.11.12
Ignite 2.8.1


Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.spark.sql.catalyst.expressions.AttributeReference.withQualifier(Lscala/Option;)Lorg/apache/spark/sql/catalyst/expressions/AttributeReference;
at 
org.apache.spark.sql.ignite.IgniteOptimization$$anonfun$pushDownOperators$1$$anonfun$applyOrElse$2.apply(IgniteOptimization.scala:87)
at 
org.apache.spark.sql.ignite.IgniteOptimization$$anonfun$pushDownOperators$1$$anonfun$applyOrElse$2.apply(IgniteOptimization.scala:87)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at 
org.apache.spark.sql.ignite.IgniteOptimization$$anonfun$pushDownOperators$1.applyOrElse(IgniteOptimization.scala:87)
at 
org.apache.spark.sql.ignite.IgniteOptimization$$anonfun$pushDownOperators$1.applyOrElse(IgniteOptimization.scala:63)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:281)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:281)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:280)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformUp(AnalysisHelper.scala:158)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformUp(LogicalPlan.scala:29)
at 

Ignite Spark - Error - Did you call Ignition.start(..)?

2020-05-28 Thread Mahesh Renduchintala
Hi,

I have very simple code to try out Ignite Spark.


public class SparkTestJava {
private static final String CONFIG = "examples/config/example-ignite.xml";
private static final String CACHE_NAME = "testCache";
private static final String TableName = "table_access_master";

/** */
public static void main(String args[]) throws AnalysisException {

IgniteSparkSession igniteSession = IgniteSparkSession.builder()
.appName("Spark test")
.master("spark://192.168.1.25:7077")
.config("spark.executor.instances", "1")
.config("spark.cores.max", 2)
.config("spark.submit.deployMode", "cluster")
.config("spark.executor.memory", "3g")
.config("spark.driver.cores", 2)
.config("spark.executor.cores", 2)
.config("spark.driver.memory", "1g")
.igniteConfig("resources/node-config-spark.xml")
.getOrCreate();

System.out.println("List of available tables:");
igniteSession.catalog().listTables().show();
igniteSession.catalog().listColumns(TableName).show();



//Selecting data through Spark SQL engine.
Dataset df = igniteSession.sql("SELECT * FROM " + TableName + " 
Limit 10");
df.printSchema();
df.show();


I see the following Error

0/05/28 15:14:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
192.168.1.25, executor 0): class org.apache.ignite.IgniteIllegalStateException: 
Ignite instance with provided name doesn't exist. Did you call 
Ignition.start(..) to start an Ignite instance? [name=null]
at org.apache.ignite.internal.IgnitionEx.grid(IgnitionEx.java:1390)
at org.apache.ignite.internal.IgnitionEx.grid(IgnitionEx.java:1258)
at org.apache.ignite.Ignition.ignite(Ignition.java:489)
at org.apache.ignite.spark.impl.package$.ignite(package.scala:84)
at 
org.apache.ignite.spark.impl.IgniteRelationProvider$$anonfun$configProvider$1$2.apply(IgniteRelationProvider.scala:226)
at 
org.apache.ignite.spark.impl.IgniteRelationProvider$$anonfun$configProvider$1$2.apply(IgniteRelationProvider.scala:223)
at org.apache.ignite.spark.Once.apply(IgniteContext.scala:222)
at org.apache.ignite.spark.IgniteContext.ignite(IgniteContext.scala:144)
at 
org.apache.ignite.spark.impl.IgniteSQLDataFrameRDD.compute(IgniteSQLDataFrameRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

is there something I am missing?

regards
Mahesh






Re: UserVersion for Remote Deployment

2020-05-14 Thread Mahesh Renduchintala
typos - corrected


a) is it a correct understanding that as long as UserVersion of the client 
matches that of the remote node for that class, the class will NOT be 
redeployed from any other client nodes?


UserVersion for Remote Deployment

2020-05-14 Thread Mahesh Renduchintala
Hi,

We have a class (SqlQuery) that migrates from the client node to the remote 
node for execution.

is the below way of fixing UserVersion for this class on the client node side 
correct? If not, what is the right way?

see below.

ignite.xml


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xmlns:util="http://www.springframework.org/schema/util;
   xsi:schemaLocation="
http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/util 
http://www.springframework.org/schema/util/spring-util.xsd;>

Ignite class loader




net.abc.xyz.SqlQuery




The other question we have is, in shared deployment mode,

a) is it a correct understanding that as long as UserVersion matches with the 
remote node for that class, the class will NOT be deployed only once to the 
remote node from many other client nodes?


b) is it a correct understanding that if we change the userVersion (here say 
from 1.1 to 1.2),  the class will redeploy once again?

regards
Mahesh




Re: Ignite 2.7.6: Memory Leak with Direct buffers in TCPCommunication SPI

2020-02-29 Thread Mahesh Renduchintala
Please see the attached jhist.
In this condition one of the node consumed about 18 GB.


jhist
Description: jhist


Cache ExpirationPolicy

2020-02-28 Thread Mahesh Renduchintala
Hi

I have a cache template marked as below in the defaul_config.xml.
I was expecting that table created as such would automatically get deleted from 
off-heap and backup.
This is because of the expiration policy set as below


However, I observe that these tables are not getting deleted? What am I doing 
wrong?







































Ignite 2.7.6: Memory Leak with Direct buffers in TCPCommunication SPI

2020-02-25 Thread Mahesh Renduchintala
Hi,

We have been searching for a cause of memory leak in ignite server nodes for 
many weeks now.
The memory leak exists and below is the scenario.

Scenario

  *   Scenario
 *   Our ignite servers have about 50GB of data. Two servers were baselined
 *   There are about 10 client nodes connected
 *   OffHeap memory - 64GB per node, 48GB heap (XMS, XMS parameters)
  *   Test
 *   start all 10 nodes
 *   wait for 10 nodes to be connected
 *   sleep for 30 mins
 *   Stop all 10 nodes (docker rm -f X)
 *   Loop 1 to 4 steps infinitely

Observations

 *   With every iteration, about 1GB of memory disappears from the heap f 
servers
 *   After about 100 iterations, server nodes crash reporting OOM

Workaround

 *   The memory leak does not occur when directBuffer is made false as 
below. Meaning TCP buffers are only onHeap
*















Hope you can reproduce this problem at your end.
This occurs with any default configuration of ignite servers. This is why, I am 
not sending mine.

regards
Mahesh



 *   The only work around we found is to use on


Ignite 2.7.6 : Discovery Threads reporting OOPS

2020-01-28 Thread Mahesh Renduchintala
Hi,

We have Ignite data nodes reporting out of memory after many days.
What could be possible reasons?


[16:45:04,018][SEVERE][tcp-disco-client-message-worker-#2404][TcpDiscoverySpi] 
Runtime error caught during grid runnable execution: GridWorker 
[name=tcp-disco-client-message-worker, igniteInstanceName=null, finished=false, 
heartbeatTs=1580229848371, hashCode=32546845, interrupted=true, 
runner=tcp-disco-client-message-worker-#2404]
java.lang.OutOfMemoryError: Java heap space
[16:45:04,018][SEVERE][tcp-disco-client-message-worker-#2404][TcpDiscoverySpi] 
Runtime error caught during grid runnable execution: IgniteSpiThread 
[name=tcp-disco-client-message-worker-#2404]
java.lang.OutOfMemoryError: Java heap space

However, the heap memory looks pretty good.

^-- Node [id=84a46677, uptime=11 days, 14:01:34.176]
^-- H/N/C [hosts=5, nodes=5, CPUs=31]
^-- CPU [cur=100%, avg=5.59%, GC=154.43%]
^-- PageMemory [pages=1696651]
^-- Heap [used=12800MB, free=50%, comm=25600MB]
^-- Off-heap [used=6705MB, free=79.72%, comm=33068MB]
^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^--   default region [used=6704MB, free=79.54%, comm=32768MB]
^--   metastoreMemPlc region [used=0MB, free=99.45%, comm=100MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=22613MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=22613MB]
^--   metastoreMemPlc region [used=unknown]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=2, qSize=0]
^-- System thread pool [active=1, idle=3, qSize=0]


Mahesh
Aline Consulting




Re: NearCache for SQL tables

2019-12-26 Thread Mahesh Renduchintala
Dennis,

We use ignite for supply chain management usecases and almost all are SQL 
schemas

In replicated mode, for large tables and for a single query, the data reads are 
happening from ram and disk of a single node.
We measured a 4-time increase in query latencies (reads) in replicated mode 
compared to partitioned mode. This is possible due to the disk coming into play.

I was thinking a near cache at client node might improve read latencies.

We find replicated mode much more suited than partitioned mode due to high 
availability needs. We also reported several issues with JOINS in partitioned 
mode and this is a show stopper for us in production deployments. Replicated 
mode does not have these issues with join and is a simple solution for backup 
and restore.

regards
Mahesh


NearCache for SQL tables

2019-12-26 Thread Mahesh Renduchintala
Hi,

is there a way to create near cache on CLIENT nodes for SQL tables?


regards
Mahesh



Fetching Server DataStorageMetric

2019-12-16 Thread Mahesh Renduchintala
Hi,

I need to fetch the DataStorageMetric of server nodes.
Tried the following API. It seems to give the dataStorageMetric of the local 
node (a thick client) only.

DataStorageMetrics pm = ignite.dataStorageMetrics()

How do I programmatically fetch DataStorageMetric of the server node(s)?


regards
Mahesh


Re: GridCachePartitionExchangeManager Null pointer exception

2019-10-05 Thread Mahesh Renduchintala
Pavel, I don't have the logs for the client node. It happened 2 times in our 
cluster till now in 45 days. Difficult to reproduce.
But the logs show a null point exception on server nodes... 1st one server node 
(192.168.1.6) went down and then the other.

In 12255, it is noted that an assertion could be seen on the coordinator, but 
this is a null pointer exception.
Agree, the race condition, described in 12255 seems similar to the logs i 
attached. But just does not explain the null pointer exception.


The race is the following:

Client node (with some configured caches) joins to a cluster sending 
SingleMessage to coordinator during client PME. This SingleMessage contains 
affinity fetch requests for all cluster caches. When SingleMessage is in-flight 
server nodes finish client PME and also process and finish cache destroy PME. 
When a cache is destroyed affinity for that cache is cleared. When 
SingleMessage delivered to coordinator it doesn’t have affinity for a requested 
cache because the cache is already destroyed. It leads to assertion error on 
the coordinator and unpredictable behavior on the client node.




Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-10-04 Thread Mahesh Renduchintala
https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception 
occurring on the thick client side.
However, the logs, that I attached show a null pointer exception on the ALL the 
server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?


Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-10-03 Thread Mahesh Renduchintala
Shivakumar's system configuration and mine could be different. But I feel, we 
are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go out 
of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump 
attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a 
ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
All the caches we destroy are SQL caches. SO we use drop table. 
IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only cache 
created with cache API may be removed with direct call to destroyCache 
[cacheName=SQL_PUBLIC_PERSON1000]

2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use SQL 
caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data is 
in memory, we have about 100GB in memory and the data regions together are 
about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be different
I don't mind discussing this on the other thread I opened (specified above. 
memory leaks)
Bottom line is: deleting tables from one thick client, is causing other thick 
clients to go OOM. This can be seen on 2.7.6 too.




Re: GridCachePartitionExchangeManager Null pointer exception

2019-10-03 Thread Mahesh Renduchintala
Hello Pavel,

OK. I am a little bit not clear on the workaround you suggested on your 
previous comment
As a workaround, I can suggest to not explicitly declare caches in the 
client configuration. During joining to cluster process, the client node 
will receive all configured caches from server nodes.

In my scenario,
a) there are absolutely no caches declared on my thick client side.
b) The cache templates are declared on the server nodes and via SQL generated 
from thick client side, the caches are created.

How do I implement the workaround you suggested?

regards
Mahesh



Re: GridCachePartitionExchangeManager Null pointer exception

2019-10-03 Thread Mahesh Renduchintala
Pavel, Thanks for your analysis. The two logs, that I attached, are those of 
two server data nodes (none are configured in thick client mode).
The logs did show a server data node, losing connection and try to connect back 
to the other node (192.168.1.6)...

On second thoughts, the below still makes sense.
https://issues.apache.org/jira/browse/IGNITE-10010

Please check.


Re: Changing password of USER "ignite"

2019-10-03 Thread Mahesh Renduchintala
So what we found out is, we are able to change the password with DBeaver after 
connecting to the DB using
username: ignite
password: ignite

However, the same does not work via Grid gain. Grid gain gives the following 
error.
Gridgain pops up
Error: Operation not allowed: authorized context is empty


Re: GridCachePartitionExchangeManager Null pointer exception

2019-10-02 Thread Mahesh Renduchintala
This seems to be a new bug, and unrelated to IGNITE-10010.
Both the nodes were fully operational when the null pointer exception happened.
The logs show that and both the nodes crashed

Can you give some insights into this, possible scenarios this could have led 
this?
Is there any potential workaround?



Re: Changing password of USER "ignite"

2019-10-01 Thread Mahesh Renduchintala

-1-
Step a)
Login to GridGain
enter user name "ignite" and password "ignite" to connect to your cluster.

Step b)
Go to the notebook tab, execute the SQL
   The SQL used was  in note book was  - ALTER USER "ignite" WITH PASSWORD 
'abc123'

Gridgain pops up
Error: Operation not allowed: authorized context is empty.

-2-

Sql = "ALTER USER "ignite" WITH PASSWORD 'abc123'"

results = superCache.query(sql).getAll();

We get a print
Error: Operation not allowed: authorized context is empty.
Very easy to produce.

regards
mahesh


Re: Ignite 2.7.0: Ignite client:: memory leak

2019-09-30 Thread Mahesh Renduchintala
We observed the same memory leak issue on 2.7.6 as well. Deleting tables 
continuously from a thick client causes out of memory exceptions in other thick 
clients. Can you please recheck? We badly need a workaround for this issue.

This is probably related to another issue that is discussed on another thread
http://apache-ignite-users.70518.x6.nabble.com/Nodes-are-restarting-when-i-try-to-drop-a-table-created-with-persistence-enabled-td27897.html



Re: Changing password of USER "ignite"

2019-09-30 Thread Mahesh Renduchintala
We followed all that, ignite user name and ignite password for the DB work fine
Now we want change the password to something else.  This is when we get
Error: Operation not allowed: authorized context is empty.

The SQL used was  ALTER USER "ignite" WITH PASSWORD 'abc123'


Changing password of USER "ignite"

2019-09-30 Thread Mahesh Renduchintala
Hi,

we are looking to start adding security with some basic authentication.
To begin with change the USER "ignite" password to something else.

When using the SQL   - ALTER USER "ignite" WITH PASSWORD 'abc123';
We get the below error.

Error: Operation not allowed: authorized context is empty.


Is there any example or something to get this basic operation right?
I understand, for advanced security, we need to implement custom security 
plugins, but for the above simple thing, is there a quick solution.

regards
mahesh


Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-09-29 Thread Mahesh Renduchintala
We noted the same on 2.7.6 as well. Deleting tables continuously from a thick 
client causes out of memory exceptions in other thick clients.
The fix, regarding grid partition message exchanges,  that went in 2.7.6 does 
not seem to work.


Ignition Start - Timeout if connection is unsuccessful

2019-09-11 Thread Mahesh Renduchintala
Hello

We are currently using Ignition.Start to get the handle for thick client.

>> ignite = Ignition.start(cfg);

As I understand, this API is a blocking API, unless the connection is 
successfully established.

However, in some scenarios, where the thick client is unable to connect 
properly, it is preferable to have a timeout option as specified below.
>> ignite = Ignition.start(cfg,  timeout);

is this already available today? If not, can you take it as an enhancement 
request for 2.8.

The reason why I ask is, in some scenarios, when a thick client comes up for 
the very first time, we see thick client making an attempt to connect to ignite 
servers almost in an infinite loop.
Previously, I raised this infinite loop connecting issue before.
http://apache-ignite-users.70518.x6.nabble.com/client-reconnect-working-td28570.html

regards
mahesh






Re: Ignite 2.7.0: Ignite client:: memory leak

2019-08-11 Thread Mahesh Renduchintala
Dennis, Thanks for the update. Will wait for the release.


Re: Ignite 2.7.0: Ignite client:: memory leak

2019-08-02 Thread Mahesh Renduchintala
The clients we use have memory ranging from 4GB to 8GB. OOM was produced on all 
these clientssome sooner, some little later, bit always was seen.

The workaround is still stable for more than 48 hours now.



Re: Ignite 2.7.0: Ignite client:: memory leak

2019-08-01 Thread Mahesh Renduchintala
Denis,
Thanks. Meanwhile, we made some progress...the via workaround seems to use 
these two flags in the thick client config
-DIGNITE_SKIP_PARTITION_SIZE_VALIDATION -DIGNITE_EXCHANGE_HISTORY_SIZE=8
So far, we haven't seen clients going OOM for about 24 hours (still watching)
Based on the logs, you could see there is a partition map exchanged among the 
clients and feels like messages continuously get stored on the clients causing 
this problem.



Re: Ignite 2.7.0: unresponsive: Found long running cache future -

2019-07-30 Thread Mahesh Renduchintala
Denis,


It is quite difficult to capture all clients logs from our production 
environment.

I gave you the server log.


I can devise a better test if you can say explain this error?

What does "long running cache futures" mean?

regards
mahesh


Re: Ignite 2.7.0: Ignite client:: memory leak

2019-07-30 Thread Mahesh Renduchintala
Infact, in the logs you can see that whenever the below print comes up, memory 
jumps up by 100-200MB

>>Full map updating for 873 groups performed in 16 ms


Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=4c8b23b4, uptime=00:19:13.959]
^-- H/N/C [hosts=8, nodes=10, CPUs=48]
^-- CPU [cur=8%, avg=7.55%, GC=0%]
^-- PageMemory [pages=0]
^-- Heap [used=1307MB, free=35.9%, comm=2039MB]
^-- Off-heap [used=0MB, free=-1%, comm=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=4, qSize=0]
^-- System thread pool [active=0, idle=2, qSize=0]
2019-07-30 14:11:48.485  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Received full message, will finish 
exchange [node=9e2951dc-e8ad-44e6-9495-83b0e5337511, 
resVer=AffinityTopologyVersion [topVer=1361, minorTopVer=0]]
2019-07-30 14:11:48.485  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Received full message, need merge 
[curFut=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
resVer=AffinityTopologyVersion [topVer=1361, minorTopVer=0]]
2019-07-30 14:11:48.485  INFO 26 --- [   sys-#167] 
.i.p.c.GridCachePartitionExchangeManager : Merge exchange future on finish 
[curFut=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
mergedFut=AffinityTopologyVersion [topVer=1358, minorTopVer=0], 
evt=NODE_JOINED, evtNode=864571bd-7235-4fe0-9e52-f3a78f35dbb2, 
evtNodeClient=false]
2019-07-30 14:11:48.485  INFO 26 --- [   sys-#167] 
.i.p.c.GridCachePartitionExchangeManager : Merge exchange future on finish 
[curFut=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
mergedFut=AffinityTopologyVersion [topVer=1359, minorTopVer=0], 
evt=NODE_FAILED, evtNode=20eef25d-b7ec-4340-9da8-1a5a35678ba5, 
evtNodeClient=false]
2019-07-30 14:11:48.485  INFO 26 --- [   sys-#167] 
.i.p.c.GridCachePartitionExchangeManager : Merge exchange future on finish 
[curFut=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
mergedFut=AffinityTopologyVersion [topVer=1360, minorTopVer=0], 
evt=NODE_JOINED, evtNode=9c318eb2-dd21-457c-8d1f-e6d4677e1a55, 
evtNodeClient=true]
2019-07-30 14:11:48.486  INFO 26 --- [   sys-#167] 
.i.p.c.GridCachePartitionExchangeManager : Merge exchange future on finish 
[curFut=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
mergedFut=AffinityTopologyVersion [topVer=1361, minorTopVer=0], 
evt=NODE_FAILED, evtNode=864571bd-7235-4fe0-9e52-f3a78f35dbb2, 
evtNodeClient=false]
2019-07-30 14:11:48.861  INFO 26 --- [   sys-#167] 
o.a.i.i.p.c.CacheAffinitySharedManager   : Affinity applying from full message 
performed in 375 ms.
2019-07-30 14:11:48.864  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Affinity changes applied in 379 ms.
2019-07-30 14:11:48.880  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Full map updating for 873 groups 
performed in 16 ms.
2019-07-30 14:11:48.880  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Finish exchange future 
[startVer=AffinityTopologyVersion [topVer=1357, minorTopVer=0], 
resVer=AffinityTopologyVersion [topVer=1361, minorTopVer=0], err=null]
2019-07-30 14:11:48.927  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Detecting lost partitions performed 
in 47 ms.
2019-07-30 14:11:49.280  INFO 26 --- [   sys-#167] 
.c.d.d.p.GridDhtPartitionsExchangeFuture : Completed partition exchange 
[localNode=4c8b23b4-ce12-4dbb-a7ea-9279711f4008, 
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
[topVer=1357, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
[id=20eef25d-b7ec-4340-9da8-1a5a35678ba5, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 
192.168.1.139, 192.168.1.181], sockAddrs=[/192.168.1.181:47500, 
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.1.139:47500], 
discPort=47500, order=1357, intOrder=696, lastExchangeTime=1564495322589, 
loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], 
topVer=AffinityTopologyVersion [topVer=1361, minorTopVer=0], 
durationFromInit=4411]
2019-07-30 14:11:49.289  INFO 26 --- [ange-worker-#43] 
.i.p.c.GridCachePartitionExchangeManager : Skipping rebalancing (no affinity 
changes) [top=AffinityTopologyVersion [topVer=1361, minorTopVer=0], 
rebTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], evt=NODE_JOINED, 
evtNode=20eef25d-b7ec-4340-9da8-1a5a35678ba5, client=true]
2019-07-30 14:11:50.127  INFO 26 --- [eout-worker-#23] 
org.apache.ignite.internal.IgniteKernal  :
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=4c8b23b4, uptime=00:19:16.964]
^-- H/N/C [hosts=8, nodes=10, CPUs=48]
^-- CPU [cur=50.33%, avg=7.59%, GC=22.8%]
^-- PageMemory [pages=0]
^-- Heap [used=1537MB, free=24.64%, comm=2039MB]
^-- Off-heap [used=0MB, free=-1%, comm=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool 

Re: Ignite 2.7.0 : server node: null pointer exception

2019-07-26 Thread Mahesh Renduchintala
Will try the configuration at server and report back.



Slava, Denis,

Can you also take a look at the below thread. I certainly believe this again a 
problem with ignite discovery spi or some such. I have attached all logs and 
configuration.


http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-thick-client-has-all-system-threads-busy-indefinitely-td28880.html


Re: Ignite 2.7.0 : server node: null pointer exception

2019-07-25 Thread Mahesh Renduchintala
IGNITE_DISCOVERY_HISTORY_SIZE=700

Does this go on the server side or the thick client side ?


Re: Ignite 2.7.0 : server node: null pointer exception

2019-07-24 Thread Mahesh Renduchintala
The clients come in and get disconnected from the cluster for many reasons - 
some intentionally and some due to poor network.
Cant have Ignite nodes crashing with null pointer exception.




Re: Ignite 2.7.0 : thick client has all system threads busy indefinitely

2019-07-22 Thread Mahesh Renduchintala
please see client and server configs.


<>


Ignite node crash: IndexOutOfBoundsException

2019-07-19 Thread Mahesh Renduchintala
Hi


We have an IndexOutOfBoundsException and ignite JVM stopped.

Can you please check if it is a known bug?


regards

mahesh
[12:42:43,456][SEVERE][exchange-worker-#63][CacheAffinitySharedManager] Failed to initialize cache. Will try to rollback cache start routine. [cacheName=SQL_PUBLIC_FSDZ]
class org.apache.ignite.IgniteCheckedException: Failed to find value class in the node classpath (use default marshaller to enable binary objects) : SQL_PUBLIC_FSDZ_1cff78c1_3fd3_4955_afe7_70a13c743bbc
	at org.apache.ignite.internal.processors.query.QueryUtils.typeForQueryEntity(QueryUtils.java:454)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStart0(GridQueryProcessor.java:706)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStart(GridQueryProcessor.java:866)
	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCache(GridCacheProcessor.java:1330)
	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:2165)
	at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:898)
	at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:798)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1231)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:738)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.lang.Thread.run(Thread.java:745)
[12:42:43,622][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6347, minorTopVer=2], discoEvt=DiscoveryCustomEvent [customMsg=DynamicCacheChangeBatch [id=6cbe2790c61-0a87ef9b-d326-4a61-944b-33378215c4b1, reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_FSDZ, hasCfg=true, nodeId=f48eb58b-381b-429a-9133-b6b4dfa41567, clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_FSDZ], stopCaches=null, startGrps=[SQL_PUBLIC_FSDZ], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false], affTopVer=AffinityTopologyVersion [topVer=6347, minorTopVer=2], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f48eb58b-381b-429a-9133-b6b4dfa41567, addrs=[0:0:0:0:0:0:0:1%lo, 10.244.1.29, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /10.244.1.29:0, /127.0.0.1:0], discPort=0, order=5872, intOrder=2946, lastExchangeTime=1563526460403, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true], topVer=6347, nodeId8=43ace095, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1563540162933]], nodeId=f48eb58b, evt=DISCOVERY_CUSTOM_EVT]
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:653)
	at java.util.ArrayList.get(ArrayList.java:429)
	at org.apache.ignite.internal.processors.cache.CacheGroupContext.singleCacheContext(CacheGroupContext.java:387)
	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.(GridDhtLocalPartition.java:200)
	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.getOrCreatePartition(GridDhtPartitionTopologyImpl.java:853)
	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:406)
	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:585)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1470)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.lang.Thread.run(Thread.java:745)

Re: ignite cluster lock up

2019-07-06 Thread Mahesh Renduchintala
We are now testing by increasing failureDetectionTimeout values


Even if full GC is running, why are ignite system threads blocked?

why aren't ignite system threads free to accept new connections?

Why exactly would rebooting a few of previously connected nodes, reset 
everything.


There could be something else as well.


Re: ignite cluster lock up

2019-07-05 Thread Mahesh Renduchintala
The long JVM pauses are probably due to long time taken by GC...

The -XMX parameter is 64GB for me.

should I be using more aggressive parameters to free up runtime heap quicker on 
the server node?


I am using the recommended JVM options on ignite website.

https://apacheignite.readme.io/docs/jvm-and-system-tuning#garbage-collection-tuning
Garbage Collection Tuning - Apache Ignite 
Documentation
apacheignite.readme.io
Apache Ignite is a memory-centric distributed database, caching, and processing 
platform for transactional, analytical, and streaming workloads, delivering 
in-memory speeds at petabyte scale





Re: ignite cluster lock up

2019-07-04 Thread Mahesh Renduchintala
attached are the config files of the server and the client.



From: Mahesh Renduchintala
Sent: Friday, July 5, 2019 12:37 AM
To: user@ignite.apache.org
Subject: ignite cluster lock up


Hi,


we have 10 clients (thick) connected to a ignite cluster (2 node, 16 threads 
each, plenty of ram).

These clients are expected to stay connected indefinitely.

New clients (thick) keep coming in, do a few queries and then they go out.

All of these work fine for sometime - a few hours.


Then what we notice is, suddenly ignite gets into a lockup mode.

New clients do not get connected. Old clients (those 10 mentioned above) cannot 
fetch data etc.


THe only way to get out of this lockup is to reboot those 10 clients one after 
the other.

When a random client in that 10 list is rebooted, the lock goes away and 
everything works fine.


Attached are the logs.








<>