Re: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.spr

2020-04-15 Thread Rajan Ahlawat
I tried IPv4 on client side, but of no success.
On server side we can't change and restart.

On Thu, Apr 16, 2020 at 12:21 AM Evgenii Zhuravlev
 wrote:
>
> I see that you use both ipv4 and ipv6 for some nodes, there is a known issue 
> with this. I would recommend to restrict Ignite to IPv4 via the 
> -Djava.net.preferIPv4Stack=true JVM parameter for all nodes in cluster, 
> including clients. I've seen communication issues with this before.
>
> Evgenii
>
> ср, 15 апр. 2020 г. в 11:31, Rajan Ahlawat :
>>
>> Client logs and stack_trace is shared.
>> Client just keep trying to connect and server keep throwing socket timeout.
>> Stack trace I gave is what I get when I try to connect to this
>> problematic ignite server and caught this stack trace.
>>
>> About this default settings, on our environment we do have only
>> default timeouts, though we tried increasing all these timeouts on
>> client side, but of no success.
>> On server side right now, we can't tweak these timeouts value, unless
>> we are sure of fix.
>>
>>
>> On Wed, Apr 15, 2020 at 8:06 PM Evgenii Zhuravlev
>>  wrote:
>> >
>> > Hi,
>> >
>> > Please provide logs not only from the server node, bu from the client node 
>> > too. You mentioned that only one client has this problems, so, please 
>> > provide full log from this node.
>> >
>> > Also, you said that you set not default timeouts for clients, while there 
>> > are still default values for server node - I wouldn't recommend to do 
>> > this, timeouts should be the same for all nodes in cluster.
>> >
>> > Evgenii
>> >
>> > ср, 15 апр. 2020 г. в 03:04, Rajan Ahlawat :
>> >>
>> >> Shared file with email-id:
>> >> e.zhuravlev...@gmail.com
>> >>
>> >> We have single instance of ignite, File contains all log of date Mar
>> >> 30, 2019. Line 6429 is the first incident of occurrence.
>> >>
>> >> On Tue, Apr 14, 2020 at 8:27 PM Evgenii Zhuravlev
>> >>  wrote:
>> >> >
>> >> > Can you provide full log files from all nodes? it's impossible to find 
>> >> > the root cause from this.
>> >> >
>> >> > Evgenii
>> >> >
>> >> > вт, 14 апр. 2020 г. в 07:49, Rajan Ahlawat :
>> >> >>
>> >> >> server starts with following configuration:
>> >> >>
>> >> >> ignite_application-1-2020-03-17.log:14:[2020-03-17T08:23:33,664][INFO
>> >> >> ][main][IgniteKernal%igniteStart] IgniteConfiguration
>> >> >> [igniteInstanceName=igniteStart, pubPoolSize=32, svcPoolSize=32,
>> >> >> callbackPoolSize=32, stripedPoolSize=32, sysPoolSize=30,
>> >> >> mgmtPoolSize=4, igfsPoolSize=32, dataStreamerPoolSize=32,
>> >> >> utilityCachePoolSize=32, utilityCacheKeepAliveTime=6,
>> >> >> p2pPoolSize=2, qryPoolSize=32,
>> >> >> igniteHome=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin,
>> >> >> igniteWorkDir=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin/work,
>> >> >> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,
>> >> >> nodeId=53396cb7-1b66-43da-bf10-ebb5f7cc9693,
>> >> >> marsh=org.apache.ignite.internal.binary.BinaryMarshaller@42b3b079,
>> >> >> marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000,
>> >> >> sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
>> >> >> metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
>> >> >> discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
>> >> >> marsh=null, reconCnt=100, reconDelay=1, maxAckTimeout=60,
>> >> >> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
>> >> >> segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true,
>> >> >> allResolversPassReq=true, segChkFreq=1,
>> >> >> commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null,
>> >> >> enableForcibleNodeKill=false, enableTroubleshootingLog=false,
>> >> >> srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@6692b6c6,
>> >> >> locAddr=null, locHost=null, locPort=47100, locPortRange=100,
>> >> >> shmemPort=-1, directBuf=true, directSndBuf=false,
>> >> >> idleConnTimeout=60, connTimeout=5000, maxConnTimeout=60,
>> 

Re: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.spr

2020-04-15 Thread Rajan Ahlawat
Client logs and stack_trace is shared.
Client just keep trying to connect and server keep throwing socket timeout.
Stack trace I gave is what I get when I try to connect to this
problematic ignite server and caught this stack trace.

About this default settings, on our environment we do have only
default timeouts, though we tried increasing all these timeouts on
client side, but of no success.
On server side right now, we can't tweak these timeouts value, unless
we are sure of fix.


On Wed, Apr 15, 2020 at 8:06 PM Evgenii Zhuravlev
 wrote:
>
> Hi,
>
> Please provide logs not only from the server node, bu from the client node 
> too. You mentioned that only one client has this problems, so, please provide 
> full log from this node.
>
> Also, you said that you set not default timeouts for clients, while there are 
> still default values for server node - I wouldn't recommend to do this, 
> timeouts should be the same for all nodes in cluster.
>
> Evgenii
>
> ср, 15 апр. 2020 г. в 03:04, Rajan Ahlawat :
>>
>> Shared file with email-id:
>> e.zhuravlev...@gmail.com
>>
>> We have single instance of ignite, File contains all log of date Mar
>> 30, 2019. Line 6429 is the first incident of occurrence.
>>
>> On Tue, Apr 14, 2020 at 8:27 PM Evgenii Zhuravlev
>>  wrote:
>> >
>> > Can you provide full log files from all nodes? it's impossible to find the 
>> > root cause from this.
>> >
>> > Evgenii
>> >
>> > вт, 14 апр. 2020 г. в 07:49, Rajan Ahlawat :
>> >>
>> >> server starts with following configuration:
>> >>
>> >> ignite_application-1-2020-03-17.log:14:[2020-03-17T08:23:33,664][INFO
>> >> ][main][IgniteKernal%igniteStart] IgniteConfiguration
>> >> [igniteInstanceName=igniteStart, pubPoolSize=32, svcPoolSize=32,
>> >> callbackPoolSize=32, stripedPoolSize=32, sysPoolSize=30,
>> >> mgmtPoolSize=4, igfsPoolSize=32, dataStreamerPoolSize=32,
>> >> utilityCachePoolSize=32, utilityCacheKeepAliveTime=6,
>> >> p2pPoolSize=2, qryPoolSize=32,
>> >> igniteHome=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin,
>> >> igniteWorkDir=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin/work,
>> >> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,
>> >> nodeId=53396cb7-1b66-43da-bf10-ebb5f7cc9693,
>> >> marsh=org.apache.ignite.internal.binary.BinaryMarshaller@42b3b079,
>> >> marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000,
>> >> sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
>> >> metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
>> >> discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
>> >> marsh=null, reconCnt=100, reconDelay=1, maxAckTimeout=60,
>> >> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
>> >> segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true,
>> >> allResolversPassReq=true, segChkFreq=1,
>> >> commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null,
>> >> enableForcibleNodeKill=false, enableTroubleshootingLog=false,
>> >> srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@6692b6c6,
>> >> locAddr=null, locHost=null, locPort=47100, locPortRange=100,
>> >> shmemPort=-1, directBuf=true, directSndBuf=false,
>> >> idleConnTimeout=60, connTimeout=5000, maxConnTimeout=60,
>> >> reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024,
>> >> slowClientQueueLimit=1000, nioSrvr=null, shmemSrv=null,
>> >> usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true,
>> >> filterReachableAddresses=false, ackSndThreshold=32,
>> >> unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null,
>> >> boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=16,
>> >> selectorSpins=0, addrRslvr=null,
>> >> ctxInitLatch=java.util.concurrent.CountDownLatch@1cd629b3[Count = 1],
>> >> stopping=false,
>> >> metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@589da3f3],
>> >> evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@39d76cb5,
>> >> colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null],
>> >> indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@1cb346ea,
>> >> addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1,
>> >> txCfg=org.apache.ignite.configuration.TransactionConfiguration@4c012563,
>> >&g

Re: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.spr

2020-04-15 Thread Rajan Ahlawat
Shared file with email-id:
e.zhuravlev...@gmail.com

We have single instance of ignite, File contains all log of date Mar
30, 2019. Line 6429 is the first incident of occurrence.

On Tue, Apr 14, 2020 at 8:27 PM Evgenii Zhuravlev
 wrote:
>
> Can you provide full log files from all nodes? it's impossible to find the 
> root cause from this.
>
> Evgenii
>
> вт, 14 апр. 2020 г. в 07:49, Rajan Ahlawat :
>>
>> server starts with following configuration:
>>
>> ignite_application-1-2020-03-17.log:14:[2020-03-17T08:23:33,664][INFO
>> ][main][IgniteKernal%igniteStart] IgniteConfiguration
>> [igniteInstanceName=igniteStart, pubPoolSize=32, svcPoolSize=32,
>> callbackPoolSize=32, stripedPoolSize=32, sysPoolSize=30,
>> mgmtPoolSize=4, igfsPoolSize=32, dataStreamerPoolSize=32,
>> utilityCachePoolSize=32, utilityCacheKeepAliveTime=6,
>> p2pPoolSize=2, qryPoolSize=32,
>> igniteHome=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin,
>> igniteWorkDir=/home/patrochandan01/ignite/apache-ignite-fabric-2.6.0-bin/work,
>> mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,
>> nodeId=53396cb7-1b66-43da-bf10-ebb5f7cc9693,
>> marsh=org.apache.ignite.internal.binary.BinaryMarshaller@42b3b079,
>> marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000,
>> sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
>> metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
>> discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
>> marsh=null, reconCnt=100, reconDelay=1, maxAckTimeout=60,
>> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
>> segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true,
>> allResolversPassReq=true, segChkFreq=1,
>> commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null,
>> enableForcibleNodeKill=false, enableTroubleshootingLog=false,
>> srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@6692b6c6,
>> locAddr=null, locHost=null, locPort=47100, locPortRange=100,
>> shmemPort=-1, directBuf=true, directSndBuf=false,
>> idleConnTimeout=60, connTimeout=5000, maxConnTimeout=60,
>> reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024,
>> slowClientQueueLimit=1000, nioSrvr=null, shmemSrv=null,
>> usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true,
>> filterReachableAddresses=false, ackSndThreshold=32,
>> unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null,
>> boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=16,
>> selectorSpins=0, addrRslvr=null,
>> ctxInitLatch=java.util.concurrent.CountDownLatch@1cd629b3[Count = 1],
>> stopping=false,
>> metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@589da3f3],
>> evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@39d76cb5,
>> colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null],
>> indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@1cb346ea,
>> addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1,
>> txCfg=org.apache.ignite.configuration.TransactionConfiguration@4c012563,
>> cacheSanityCheckEnabled=true, discoStartupDelay=6,
>> deployMode=SHARED, p2pMissedCacheSize=100, locHost=null,
>> timeSrvPortBase=31100, timeSrvPortRange=100,
>> failureDetectionTimeout=1, clientFailureDetectionTimeout=3,
>> metricsLogFreq=6, hadoopCfg=null,
>> connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@14a50707,
>> odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration
>> [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null,
>> grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null,
>> binaryCfg=null, memCfg=null, pstCfg=null,
>> dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040,
>> sysCacheMaxSize=104857600, pageSize=0, concLvl=25,
>> dfltDataRegConf=DataRegionConfiguration [name=Default_Region,
>> maxSize=20971520, initSize=15728640, swapPath=null,
>> pageEvictionMode=RANDOM_2_LRU, evictionThreshold=0.9,
>> emptyPagesPoolSize=100, metricsEnabled=false,
>> metricsSubIntervalCount=5, metricsRateTimeInterval=6,
>> persistenceEnabled=false, checkpointPageBufSize=0], storagePath=null,
>> checkpointFreq=18, lockWaitTime=1, checkpointThreads=4,
>> checkpointWriteOrder=SEQUENTIAL, walHistSize=20, walSegments=10,
>> walSegmentSize=67108864, walPath=db/wal,
>> walArchivePath=db/wal/archive, metricsEnabled=false, walMode=LOG_ONLY,
>> walTlbSize=131072, walBuffSize=0, walFlushFreq=2000,
>> walFsyncDelay=1000, walRecordIterBuffSize=67108864,
>> alwaysWri

Re: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.spr

2020-04-14 Thread Rajan Ahlawat
]

In server configuration we didn't define any socketTimeout, server
might be throwing socket timeout not client. But It occurs for only
one particular client and this server. Other web applications are able
to connect with same server on our production environment.

Thanks

On Mon, Apr 13, 2020 at 8:09 PM Evgenii Zhuravlev
 wrote:
>
> Hi,
>
> Can you share full logs from all nodes? I mean log files, not the console 
> output.
>
> Evgenii
>
> вс, 12 апр. 2020 г. в 20:30, Rajan Ahlawat :
>>
>> ?
>>
>> On Thu, Apr 9, 2020 at 3:11 AM Rajan Ahlawat  wrote:
>> >
>> > -- Forwarded message -
>> > From: Rajan Ahlawat 
>> > Date: Thu, Apr 9, 2020 at 3:09 AM
>> > Subject: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed
>> > to reconnect to cluster (will retry): class
>> > o.a.i.IgniteCheckedException: Failed to deserialize object with given
>> > class loader: org.springframework.boot.loader.LaunchedURLClassLoader
>> > To: 
>> >
>> >
>> > Hi
>> >
>> > We suddenly started getting following exception on client side after
>> > node running application got restarted:
>> >
>> > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to
>> > reconnect to cluster (will retry): class o.a.i.IgniteCheckedException:
>> > Failed to deserialize object with given class loader:
>> > org.springframework.boot.loader.LaunchedURLClassLoader
>> >
>> > I see similar bug was raised here for version 2.7.0:
>> > https://issues.apache.org/jira/browse/IGNITE-11730
>> >
>> > We are currently using version 2.6.0
>> > Following is our tcpDiscoveryApi configurations:
>> >
>> > private void setDiscoverySpiConfig(IgniteConfiguration cfg) {
>> > TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
>> >
>> > setIpFinder(discoverySpi);
>> > 
>> > discoverySpi.setNetworkTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());
>> > 
>> > discoverySpi.setSocketTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());
>> > 
>> > discoverySpi.setJoinTimeout(platformCachingConfiguration.getIgnite().getJoinTimeout());
>> > 
>> > discoverySpi.setClientReconnectDisabled(platformCachingConfiguration.getIgnite().isClientReconnectDisabled());
>> > 
>> > discoverySpi.setReconnectCount(platformCachingConfiguration.getIgnite().getReconnectCount());
>> > 
>> > discoverySpi.setReconnectDelay(platformCachingConfiguration.getIgnite().getReconnectDelay());
>> >
>> > cfg.setDiscoverySpi(discoverySpi);
>> > }
>> >
>> > Its IPfinder config is
>> >
>> > private void setTcpIpFinder(TcpDiscoverySpi discoverySpi) {
>> > TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
>> >
>> > 
>> > ipFinder.setAddresses(platformCachingConfiguration.getIgnite().getNodes());
>> > discoverySpi.setIpFinder(ipFinder);
>> > }
>> >
>> > We have tried every combination of timeouts, right now timeouts are
>> > set at very hight value .
>> >
>> > (1) If we are having same bug mentioned for 2.7.0 version, but bug
>> > desc says it occurs on server side, but we are getting exact same
>> > stack trance in ClientImpl.java on client side.
>> > (2) assuming it is same issues, is there a way to disable data bag
>> > compression check, since upgrading both client and server version
>> > would not be possible immediately.
>> >
>> > Thanks in advance.


Re: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.spr

2020-04-12 Thread Rajan Ahlawat
?

On Thu, Apr 9, 2020 at 3:11 AM Rajan Ahlawat  wrote:
>
> -- Forwarded message -
> From: Rajan Ahlawat 
> Date: Thu, Apr 9, 2020 at 3:09 AM
> Subject: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed
> to reconnect to cluster (will retry): class
> o.a.i.IgniteCheckedException: Failed to deserialize object with given
> class loader: org.springframework.boot.loader.LaunchedURLClassLoader
> To: 
>
>
> Hi
>
> We suddenly started getting following exception on client side after
> node running application got restarted:
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to
> reconnect to cluster (will retry): class o.a.i.IgniteCheckedException:
> Failed to deserialize object with given class loader:
> org.springframework.boot.loader.LaunchedURLClassLoader
>
> I see similar bug was raised here for version 2.7.0:
> https://issues.apache.org/jira/browse/IGNITE-11730
>
> We are currently using version 2.6.0
> Following is our tcpDiscoveryApi configurations:
>
> private void setDiscoverySpiConfig(IgniteConfiguration cfg) {
> TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
>
> setIpFinder(discoverySpi);
> 
> discoverySpi.setNetworkTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());
> 
> discoverySpi.setSocketTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());
> 
> discoverySpi.setJoinTimeout(platformCachingConfiguration.getIgnite().getJoinTimeout());
> 
> discoverySpi.setClientReconnectDisabled(platformCachingConfiguration.getIgnite().isClientReconnectDisabled());
> 
> discoverySpi.setReconnectCount(platformCachingConfiguration.getIgnite().getReconnectCount());
> 
> discoverySpi.setReconnectDelay(platformCachingConfiguration.getIgnite().getReconnectDelay());
>
> cfg.setDiscoverySpi(discoverySpi);
> }
>
> Its IPfinder config is
>
> private void setTcpIpFinder(TcpDiscoverySpi discoverySpi) {
> TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
>
> 
> ipFinder.setAddresses(platformCachingConfiguration.getIgnite().getNodes());
> discoverySpi.setIpFinder(ipFinder);
> }
>
> We have tried every combination of timeouts, right now timeouts are
> set at very hight value .
>
> (1) If we are having same bug mentioned for 2.7.0 version, but bug
> desc says it occurs on server side, but we are getting exact same
> stack trance in ClientImpl.java on client side.
> (2) assuming it is same issues, is there a way to disable data bag
> compression check, since upgrading both client and server version
> would not be possible immediately.
>
> Thanks in advance.


Fwd: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to reconnect to cluster (will retry): class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: org.sp

2020-04-08 Thread Rajan Ahlawat
-- Forwarded message -
From: Rajan Ahlawat 
Date: Thu, Apr 9, 2020 at 3:09 AM
Subject: org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed
to reconnect to cluster (will retry): class
o.a.i.IgniteCheckedException: Failed to deserialize object with given
class loader: org.springframework.boot.loader.LaunchedURLClassLoader
To: 


Hi

We suddenly started getting following exception on client side after
node running application got restarted:

org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to
reconnect to cluster (will retry): class o.a.i.IgniteCheckedException:
Failed to deserialize object with given class loader:
org.springframework.boot.loader.LaunchedURLClassLoader

I see similar bug was raised here for version 2.7.0:
https://issues.apache.org/jira/browse/IGNITE-11730

We are currently using version 2.6.0
Following is our tcpDiscoveryApi configurations:

private void setDiscoverySpiConfig(IgniteConfiguration cfg) {
TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();

setIpFinder(discoverySpi);

discoverySpi.setNetworkTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());

discoverySpi.setSocketTimeout(platformCachingConfiguration.getIgnite().getSocketTimeout());

discoverySpi.setJoinTimeout(platformCachingConfiguration.getIgnite().getJoinTimeout());

discoverySpi.setClientReconnectDisabled(platformCachingConfiguration.getIgnite().isClientReconnectDisabled());

discoverySpi.setReconnectCount(platformCachingConfiguration.getIgnite().getReconnectCount());

discoverySpi.setReconnectDelay(platformCachingConfiguration.getIgnite().getReconnectDelay());

cfg.setDiscoverySpi(discoverySpi);
}

Its IPfinder config is

private void setTcpIpFinder(TcpDiscoverySpi discoverySpi) {
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();

ipFinder.setAddresses(platformCachingConfiguration.getIgnite().getNodes());
discoverySpi.setIpFinder(ipFinder);
}

We have tried every combination of timeouts, right now timeouts are
set at very hight value .

(1) If we are having same bug mentioned for 2.7.0 version, but bug
desc says it occurs on server side, but we are getting exact same
stack trance in ClientImpl.java on client side.
(2) assuming it is same issues, is there a way to disable data bag
compression check, since upgrading both client and server version
would not be possible immediately.

Thanks in advance.


Re: Ignite partitioned mode not scaling

2020-01-03 Thread Rajan Ahlawat
We are using following ignite client :

org.apache.ignite:ignite-core:2.6.0
org.apache.ignite:ignite-spring-data:2.6.0

Benchmark source code is pretty simple it does following :

Executors.newFixedThreadPool(threadPoolSize) working behind rateLimiter
executes threads
Each threads makes get query in three cache IgniteRepository tables,
something like this :
memberCacheRepository.getMemberCacheObjectByMemberUuid(memberUuid)

cache is created during spring boot application load via
igniteCacheConfiguration like :

CacheConfiguration createSqlCacheConfig(String cacheName, String dataRegion) {
CacheConfiguration sqlCacheConfig = new CacheConfiguration(cacheName);
sqlCacheConfig.setBackups(0);

sqlCacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.PRIMARY_SYNC);
sqlCacheConfig.setCacheMode(CacheMode.PARTITIONED);
sqlCacheConfig.setDataRegionName(dataRegion);
return sqlCacheConfig;
}

I am sorry but won't be able to share the complete code, please let me
know what specific information is required.



On Fri, Jan 3, 2020 at 2:45 PM Mikhail Cherkasov 
wrote:

> What type of client do you use? is it JDBC thin driver?
>
> The best if you can share benchmark source code, so we can see what
> queries you use, what flags you set to queries and etc.
>
> On Thu, Jan 2, 2020 at 10:07 PM Rajan Ahlawat 
> wrote:
>
>> If QPS > 2000 I am using multiple hosts for application which is shooting
>> requests to cache.
>> If benchmark is the bottleneck, we shouldn't see drop from 2600 to 2200
>> when we go from 1 to 3 node cluster.
>>
>> On Fri, Jan 3, 2020 at 11:24 AM Rajan Ahlawat 
>> wrote:
>>
>>> Hi Mikhail
>>>
>>> could you please share the benchmark code with us?
>>> I am first filling up around a million records in cache. Then through
>>> direct cache service classes, fetching those records randomly.
>>>
>>> do you run queries against the same amount of records each time?
>>> Yes, 2600 QPS means, it picks 2600 records randomly over a second and do
>>> get query over sql caches of different tables.
>>>
>>> what host machines do you use for your nodes? when you say that you have
>>> 5 nodes, does it mean that you use 5 dedicates machines for each node?
>>> Yes, these are five dedicated linux machines.
>>>
>>> Also, it might be that the benchmark itself is the bottleneck, so your
>>> system can handle more QPS, but you need to run a benchmark from several
>>> machines. Please try to use at least 2 hosts for the benchmark application
>>> and check if there any changes in QPS.
>>> As you can see in the table, I have tried with different combinations on
>>> nodes, and with increase in nodes, our qps of requests being served under
>>> 50ms is getting down each time.
>>>
>>>
>>> On Fri, Jan 3, 2020 at 1:29 AM Mikhail Cherkasov <
>>> mcherka...@gridgain.com> wrote:
>>>
>>>> Hi Rajan,
>>>>
>>>> could you please share the benchmark code with us?
>>>> do you run queries against the same amount of records each time?
>>>> what host machines do you use for your nodes? when you say that you
>>>> have 5 nodes, does it mean that you use 5 dedicates machines for each node?
>>>> Also, it might be that the benchmark itself is the bottleneck, so your
>>>> system can handle more QPS, but you need to run a benchmark from several
>>>> machines. Please try to use at least 2 hosts for the benchmark application
>>>> and check if there any changes in QPS.
>>>>
>>>> Thanks,
>>>> Mike.
>>>>
>>>> On Thu, Jan 2, 2020 at 2:49 AM Rajan Ahlawat 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> -- Forwarded message -
>>>>> From: Rajan Ahlawat 
>>>>> Date: Thu, Jan 2, 2020 at 4:05 PM
>>>>> Subject: Ignite partitioned mode not scaling
>>>>> To: 
>>>>>
>>>>>
>>>>> We are moving from replicated (1-node cluster) to multinode
>>>>> partitioned cluster.
>>>>> So assumption was that max QPS we can reach would be more if no. of
>>>>> nodes are added to cluster.
>>>>> We compared under 50ms QPS stats of partitioned mode with increasing
>>>>> no. of nodes in cluster, and found that performance actually degraded.
>>>>> We are using ignite key value as well as sql cache, where most of the
>>>>> data in sql cache, no persistence is being used.
>>>>>
>>>>> please let us know what we are doing wrong or what can be done to make
>>>>> it scalable.
>>>>> here are the results of perf tests :
>>>>>
>>>>> *50ms in 95 percentile comparison of partitioned-mode*
>>>>>
>>>>> Response time in ms
>>>>> cache mode (partitioned)QPSread from sql tableread from sql table
>>>>> with joinread from sql table
>>>>> 1-node 2600 48 46 47
>>>>> 3-node 2190 50 48 49
>>>>> 3-node-1-backup 2200 55 53 54
>>>>> 5-node 2000 54 52 53
>>>>> 5-node-2-backup 1990 51 49 50
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Mikhail.
>>>>
>>>
>
> --
> Thanks,
> Mikhail.
>


Re: Ignite partitioned mode not scaling

2020-01-02 Thread Rajan Ahlawat
If QPS > 2000 I am using multiple hosts for application which is shooting
requests to cache.
If benchmark is the bottleneck, we shouldn't see drop from 2600 to 2200
when we go from 1 to 3 node cluster.

On Fri, Jan 3, 2020 at 11:24 AM Rajan Ahlawat 
wrote:

> Hi Mikhail
>
> could you please share the benchmark code with us?
> I am first filling up around a million records in cache. Then through
> direct cache service classes, fetching those records randomly.
>
> do you run queries against the same amount of records each time?
> Yes, 2600 QPS means, it picks 2600 records randomly over a second and do
> get query over sql caches of different tables.
>
> what host machines do you use for your nodes? when you say that you have 5
> nodes, does it mean that you use 5 dedicates machines for each node?
> Yes, these are five dedicated linux machines.
>
> Also, it might be that the benchmark itself is the bottleneck, so your
> system can handle more QPS, but you need to run a benchmark from several
> machines. Please try to use at least 2 hosts for the benchmark application
> and check if there any changes in QPS.
> As you can see in the table, I have tried with different combinations on
> nodes, and with increase in nodes, our qps of requests being served under
> 50ms is getting down each time.
>
>
> On Fri, Jan 3, 2020 at 1:29 AM Mikhail Cherkasov 
> wrote:
>
>> Hi Rajan,
>>
>> could you please share the benchmark code with us?
>> do you run queries against the same amount of records each time?
>> what host machines do you use for your nodes? when you say that you have
>> 5 nodes, does it mean that you use 5 dedicates machines for each node?
>> Also, it might be that the benchmark itself is the bottleneck, so your
>> system can handle more QPS, but you need to run a benchmark from several
>> machines. Please try to use at least 2 hosts for the benchmark application
>> and check if there any changes in QPS.
>>
>> Thanks,
>> Mike.
>>
>> On Thu, Jan 2, 2020 at 2:49 AM Rajan Ahlawat 
>> wrote:
>>
>>>
>>>
>>> -- Forwarded message -
>>> From: Rajan Ahlawat 
>>> Date: Thu, Jan 2, 2020 at 4:05 PM
>>> Subject: Ignite partitioned mode not scaling
>>> To: 
>>>
>>>
>>> We are moving from replicated (1-node cluster) to multinode partitioned
>>> cluster.
>>> So assumption was that max QPS we can reach would be more if no. of
>>> nodes are added to cluster.
>>> We compared under 50ms QPS stats of partitioned mode with increasing no.
>>> of nodes in cluster, and found that performance actually degraded.
>>> We are using ignite key value as well as sql cache, where most of the
>>> data in sql cache, no persistence is being used.
>>>
>>> please let us know what we are doing wrong or what can be done to make
>>> it scalable.
>>> here are the results of perf tests :
>>>
>>> *50ms in 95 percentile comparison of partitioned-mode*
>>>
>>> Response time in ms
>>> cache mode (partitioned)QPSread from sql tableread from sql table with
>>> joinread from sql table
>>> 1-node 2600 48 46 47
>>> 3-node 2190 50 48 49
>>> 3-node-1-backup 2200 55 53 54
>>> 5-node 2000 54 52 53
>>> 5-node-2-backup 1990 51 49 50
>>>
>>
>>
>> --
>> Thanks,
>> Mikhail.
>>
>


Re: Ignite partitioned mode not scaling

2020-01-02 Thread Rajan Ahlawat
Hi Mikhail

could you please share the benchmark code with us?
I am first filling up around a million records in cache. Then through
direct cache service classes, fetching those records randomly.

do you run queries against the same amount of records each time?
Yes, 2600 QPS means, it picks 2600 records randomly over a second and do
get query over sql caches of different tables.

what host machines do you use for your nodes? when you say that you have 5
nodes, does it mean that you use 5 dedicates machines for each node?
Yes, these are five dedicated linux machines.

Also, it might be that the benchmark itself is the bottleneck, so your
system can handle more QPS, but you need to run a benchmark from several
machines. Please try to use at least 2 hosts for the benchmark application
and check if there any changes in QPS.
As you can see in the table, I have tried with different combinations on
nodes, and with increase in nodes, our qps of requests being served under
50ms is getting down each time.


On Fri, Jan 3, 2020 at 1:29 AM Mikhail Cherkasov 
wrote:

> Hi Rajan,
>
> could you please share the benchmark code with us?
> do you run queries against the same amount of records each time?
> what host machines do you use for your nodes? when you say that you have 5
> nodes, does it mean that you use 5 dedicates machines for each node?
> Also, it might be that the benchmark itself is the bottleneck, so your
> system can handle more QPS, but you need to run a benchmark from several
> machines. Please try to use at least 2 hosts for the benchmark application
> and check if there any changes in QPS.
>
> Thanks,
> Mike.
>
> On Thu, Jan 2, 2020 at 2:49 AM Rajan Ahlawat 
> wrote:
>
>>
>>
>> -- Forwarded message -
>> From: Rajan Ahlawat 
>> Date: Thu, Jan 2, 2020 at 4:05 PM
>> Subject: Ignite partitioned mode not scaling
>> To: 
>>
>>
>> We are moving from replicated (1-node cluster) to multinode partitioned
>> cluster.
>> So assumption was that max QPS we can reach would be more if no. of nodes
>> are added to cluster.
>> We compared under 50ms QPS stats of partitioned mode with increasing no.
>> of nodes in cluster, and found that performance actually degraded.
>> We are using ignite key value as well as sql cache, where most of the
>> data in sql cache, no persistence is being used.
>>
>> please let us know what we are doing wrong or what can be done to make it
>> scalable.
>> here are the results of perf tests :
>>
>> *50ms in 95 percentile comparison of partitioned-mode*
>>
>> Response time in ms
>> cache mode (partitioned)QPSread from sql tableread from sql table with
>> joinread from sql table
>> 1-node 2600 48 46 47
>> 3-node 2190 50 48 49
>> 3-node-1-backup 2200 55 53 54
>> 5-node 2000 54 52 53
>> 5-node-2-backup 1990 51 49 50
>>
>
>
> --
> Thanks,
> Mikhail.
>


Fwd: Ignite partitioned mode not scaling

2020-01-02 Thread Rajan Ahlawat
-- Forwarded message -
From: Rajan Ahlawat 
Date: Thu, Jan 2, 2020 at 4:05 PM
Subject: Ignite partitioned mode not scaling
To: 


We are moving from replicated (1-node cluster) to multinode partitioned
cluster.
So assumption was that max QPS we can reach would be more if no. of nodes
are added to cluster.
We compared under 50ms QPS stats of partitioned mode with increasing no. of
nodes in cluster, and found that performance actually degraded.
We are using ignite key value as well as sql cache, where most of the data
in sql cache, no persistence is being used.

please let us know what we are doing wrong or what can be done to make it
scalable.
here are the results of perf tests :

*50ms in 95 percentile comparison of partitioned-mode*

Response time in ms
cache mode (partitioned)QPSread from sql tableread from sql table with joinread
from sql table
1-node 2600 48 46 47
3-node 2190 50 48 49
3-node-1-backup 2200 55 53 54
5-node 2000 54 52 53
5-node-2-backup 1990 51 49 50