Re: Failed to clean IP finder up.
Cool, thanks! Any idea when 2.8 is planned to be released? Regards, Marco On Wed, 2 Oct 2019 at 11:31, Stephen Darlington < stephen.darling...@gridgain.com> wrote: > Looks like it missed being part of the 2.7.x release by a month or two. It > will be resolved when 2.8.0 comes out. > > Regards, > Stephen > > On 2 Oct 2019, at 09:59, Marco Bernagozzi > wrote: > > I'm getting this error when the nodes are shutting down. > What are the possible causes for this? > A bug was marked for this a year ago or so, but was marked as resolved, it > seems? > > http://apache-ignite-issues.70530.x6.nabble.com/jira-Updated-IGNITE-9826-Ignite-node-with-TcpDiscoveryS3IpFinder-can-hang-while-stopping-td75612.html > > > Ignite cache is empty. Shutting down... > 2019-10-02 08:44:37 [main] INFO > org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol.info > <http://org.apache.ignite.internal.processors.rest.protocols.tcp.gridtcprestprotocol.info/>(117) > - Command protocol successfully stopped: TCP binary > 2019-10-02 08:44:37 [main] INFO > org.eclipse.jetty.server.AbstractConnector.doStop(332) - Stopped > ServerConnector@70101687{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} > 2019-10-02 08:44:37 [main] INFO > org.apache.ignite.internal.processors.rest.protocols.http.jetty.GridJettyRestProtocol.info > <http://org.apache.ignite.internal.processors.rest.protocols.http.jetty.gridjettyrestprotocol.info/>(117) > - Command protocol successfully stopped: Jetty REST > 2019-10-02 08:44:37 [grid-timeout-worker-#79] INFO > org.apache.ignite.internal.IgniteKernal.info > <http://org.apache.ignite.internal.ignitekernal.info/>(117) - > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=c51b8462, uptime=00:15:00.073] > ^-- H/N/C [hosts=2, nodes=2, CPUs=72] > ^-- CPU [cur=-100%, avg=-99.73%, GC=0%] > ^-- PageMemory [pages=1226] > ^-- Heap [used=112MB, free=99.59%, comm=516MB] > ^-- Off-heap [used=4MB, free=99.97%, comm=336MB] > ^-- sysMemPlc region [used=0MB, free=99.21%, comm=40MB] > ^-- default region [used=4MB, free=99.97%, comm=256MB] > ^-- TxLog region [used=0MB, free=100%, comm=40MB] > ^-- Outbound messages queue [size=0] > ^-- Public thread pool [active=0, idle=0, qSize=0] > ^-- System thread pool [active=0, idle=6, qSize=0] > 2019-10-02 08:44:37 [tcp-disco-sock-reader-#7] INFO > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info > <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) - > Finished serving remote node connection [rmtAddr=/10.0.11.180:49151, > rmtPort=49151 > 2019-10-02 08:44:37 [tcp-disco-sock-reader-#4] INFO > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info > <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) - > Finished serving remote node connection [rmtAddr=/10.0.31.134:42413, > rmtPort=42413 > 2019-10-02 08:44:37 [tcp-disco-sock-reader-#6] INFO > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info > <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) - > Finished serving remote node connection [rmtAddr=/10.0.21.167:60763, > rmtPort=60763 > 2019-10-02 08:44:37 [tcp-disco-ip-finder-cleaner-#5] ERROR > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.error(137) - Failed to > clean IP finder up. > org.apache.ignite.spi.IgniteSpiException: Failed to list objects in the > bucket: ignite-configurations-production > at > org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192) > at > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML > document destined for handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > at > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214) > at > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298) > at > com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) > at > com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) > at > com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
Failed to clean IP finder up.
I'm getting this error when the nodes are shutting down. What are the possible causes for this? A bug was marked for this a year ago or so, but was marked as resolved, it seems? http://apache-ignite-issues.70530.x6.nabble.com/jira-Updated-IGNITE-9826-Ignite-node-with-TcpDiscoveryS3IpFinder-can-hang-while-stopping-td75612.html Ignite cache is empty. Shutting down... 2019-10-02 08:44:37 [main] INFO org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol.info(117) - Command protocol successfully stopped: TCP binary 2019-10-02 08:44:37 [main] INFO org.eclipse.jetty.server.AbstractConnector.doStop(332) - Stopped ServerConnector@70101687{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} 2019-10-02 08:44:37 [main] INFO org.apache.ignite.internal.processors.rest.protocols.http.jetty.GridJettyRestProtocol.info(117) - Command protocol successfully stopped: Jetty REST 2019-10-02 08:44:37 [grid-timeout-worker-#79] INFO org.apache.ignite.internal.IgniteKernal.info(117) - Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=c51b8462, uptime=00:15:00.073] ^-- H/N/C [hosts=2, nodes=2, CPUs=72] ^-- CPU [cur=-100%, avg=-99.73%, GC=0%] ^-- PageMemory [pages=1226] ^-- Heap [used=112MB, free=99.59%, comm=516MB] ^-- Off-heap [used=4MB, free=99.97%, comm=336MB] ^-- sysMemPlc region [used=0MB, free=99.21%, comm=40MB] ^-- default region [used=4MB, free=99.97%, comm=256MB] ^-- TxLog region [used=0MB, free=100%, comm=40MB] ^-- Outbound messages queue [size=0] ^-- Public thread pool [active=0, idle=0, qSize=0] ^-- System thread pool [active=0, idle=6, qSize=0] 2019-10-02 08:44:37 [tcp-disco-sock-reader-#7] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished serving remote node connection [rmtAddr=/10.0.11.180:49151, rmtPort=49151 2019-10-02 08:44:37 [tcp-disco-sock-reader-#4] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished serving remote node connection [rmtAddr=/10.0.31.134:42413, rmtPort=42413 2019-10-02 08:44:37 [tcp-disco-sock-reader-#6] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished serving remote node connection [rmtAddr=/10.0.21.167:60763, rmtPort=60763 2019-10-02 08:44:37 [tcp-disco-ip-finder-cleaner-#5] ERROR org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.error(137) - Failed to clean IP finder up. org.apache.ignite.spi.IgniteSpiException: Failed to list objects in the bucket: ignite-configurations-production at org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900) at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998) at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819) at
Re: Failed to read magic header (too few bytes received)
Update 2: Digging more in the logging, the issue seems to be: [tcp-disco-ip-finder-cleaner-#5] ERROR org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to clean IP finder up. class org.apache.ignite.spi.IgniteSpiException: Failed to list objects in the bucket: ignite-configurations-production at org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900) at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998) at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:791) at org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:146) ... 4 more Caused by: com.amazonaws.AbortedException: at com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) at java.base/java.io.BufferedReader.read1(BufferedReader.java:210) at java.base/java.io.BufferedReader.read(BufferedReader.java:287) at java.base/java.io.Reader.read(Reader.java:229) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:186) ... 24 more [tcp-disco-msg-worker-#2] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node [newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1], sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, / 127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, / 10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500], discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]] On Tue, 24 Sep 2019 at 15:59, Marco Bernagozzi wrote: > It was on aws, and I don't have the IP log of all the instances I had up. > My best guess it's that it was just one of the slave instances. > I have two sets of machines, masters and slaves. They are all servers. The > masters create caches and distribute caches to a set of slaves using a node > filter. > Here are the options I'm using to run it > > CMD ["java", &
Re: Failed to read magic header (too few bytes received)
It was on aws, and I don't have the IP log of all the instances I had up. My best guess it's that it was just one of the slave instances. I have two sets of machines, masters and slaves. They are all servers. The masters create caches and distribute caches to a set of slaves using a node filter. Here are the options I'm using to run it CMD ["java", "-jar", "-XX:+AlwaysPreTouch", "-XX:+UseG1GC", "-XX:+DisableExplicitGC", "-XX:+ScavengeBeforeFullGC", "--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED", "--add-exports=java.base/sun.nio.ch=ALL-UNNAMED", "--add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED", "--add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED", "--add-exports=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED","--illegal-access=permit", "-Djdk.tls.client.protocols=TLSv1.2", "-Djava.net.preferIPv4Stack=true", "-DIGNITE_QUIET=false", "algotworker.jar", "server", "config.yml"] In my logs, "10.0.11.210" appears first in the slave nodes: [main] INFO org.apache.ignite.internal.IgniteKernal - Non-loopback local IPs: 10.0.11.210, 169.254.1.1, 172.17.0.1, 172.18.0.1 [tcp-disco-msg-worker-#2] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node [newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1], sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, / 127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, / 10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500], discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]] [tcp-disco-msg-worker-#2] WARN org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node has connection to it's previous, trying previous again. [next=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1], sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, / 127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, / 10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500], discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]] [tcp-disco-msg-worker-#2] INFO org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node [newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1], sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, / 127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, / 10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500], discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]] [tcp-disco-msg-worker-#2] WARN org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node has connection to it's previous, trying previous again. [next=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1], sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, / 127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, / 10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500], discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]] I didn't see this before. What does this mean? I guess this is the issue, right? On Tue, 24 Sep 2019 at 15:16, Stephen Darlington < stephen.darling...@gridgain.com> wrote: > What’s on IP address 10.0.11.210? It’s sending Ignite something that it > doesn’t understand. Maybe it’s not another copy of Ignite? Could it be a > firewall setting truncating the message? Or perhaps the remote node has a > different configuration, for example mixing up communication and discovery > ports? > > Regards, > Stephen > > On 24 Sep 2019, at 13:00, Marco Bernagozzi > wrote: > > Hi. > I get this error sometimes, it seems to be quite random. Any idea what > this might be caused by? > Since every time it's thrown a hundred times or so, I had to temporarily > suppress all the errors from that class. Is there a way to fix this or at > least to make it be thrown just once? > > My settings: > > AwsConfiguration awsConfiguration = > AwsConfigurationSingleton.getInstance(); > BasicAWSCredentials creds = new BasicAWSCredentials( > awsConfiguration.getAwsAccessKey(), > awsConfiguration.getAwsSecretKey() > ); > TcpDiscoveryS3IpFinder ipFinder = new TcpDiscoveryS3
Failed to read magic header (too few bytes received)
Hi. I get this error sometimes, it seems to be quite random. Any idea what this might be caused by? Since every time it's thrown a hundred times or so, I had to temporarily suppress all the errors from that class. Is there a way to fix this or at least to make it be thrown just once? My settings: AwsConfiguration awsConfiguration = AwsConfigurationSingleton.getInstance(); BasicAWSCredentials creds = new BasicAWSCredentials( awsConfiguration.getAwsAccessKey(), awsConfiguration.getAwsSecretKey() ); TcpDiscoveryS3IpFinder ipFinder = new TcpDiscoveryS3IpFinder(); ipFinder.setAwsCredentials(creds); ipFinder.setBucketEndpoint("s3.eu-west-1.amazonaws.com"); ipFinder.setBucketName(awsConfiguration.getIgniteBucket()); TcpDiscoverySpi spi = new TcpDiscoverySpi(); spi.setIpFinder(ipFinder); IgniteConfiguration cfg = new IgniteConfiguration(); DataStorageConfiguration storageCfg = new DataStorageConfiguration(); storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(false); cfg.setDataStorageConfiguration(storageCfg); IgniteLogger log = new Slf4jLogger(); cfg.setGridLogger(log); TcpCommunicationSpi commSpi = new TcpCommunicationSpi(); commSpi.setMessageQueueLimit(1); cfg.setCommunicationSpi(commSpi); cfg.setDiscoverySpi(spi); cfg.setBinaryConfiguration(new BinaryConfiguration()); cfg.getBinaryConfiguration().setCompactFooter(false); cfg.setFailureDetectionTimeout(60 * 1000); The log: INFO [2019-09-24 10:04:21,402] org.apache.ignite.internal.IgniteKernal: >>> __ >>> / _/ ___/ |/ / _/_ __/ __/ >>> _/ // (7 7 // / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >>> >>> ver. 2.7.5#20190603-sha1:be4f2a15 >>> 2018 Copyright(C) Apache Software Foundation >>> >>> Ignite documentation: http://ignite.apache.org INFO [2019-09-24 10:04:21,402] org.apache.ignite.internal.IgniteKernal: Config URL: n/a INFO [2019-09-24 10:04:21,413] org.apache.ignite.internal.IgniteKernal: IgniteConfiguration [igniteInstanceName=null, pubPoolSize=36, svcPoolSize=36, callbackPoolSize=36, stripedPoolSize=36, sysPoolSize=36, mgmtPoolSize=4, igfsPoolSize=36, dataStreamerPoolSize=36, utilityCachePoolSize=36, utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=36, igniteHome=/opt/ignite/apache-ignite-2.7.5-bin, igniteWorkDir=/opt/ignite/apache-ignite-2.7.5-bin/work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@47c81abf, nodeId=09156c05-f869-4826-814e-1f47fcefeeb4, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@104d9de4, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=60, connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=18, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@2b43b6cc[Count = 1], stopping=false], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@3221402f, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@415fad70, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@7947ad4c, clientMode=false, rebalanceThreadPoolSize=1, txCfg=TransactionConfiguration [txSerEnabled=false, dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, pessimisticTxLogSize=0, pessimisticTxLogLinger=1, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=6, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=3, metricsLogFreq=6, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=60, idleQryCurCheckFreq=6, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false, sslClientAuth=false,
Re: Cache spreading to new nodes
Hi, Sorry, tearing down the project to make a runnable proved to be a much bigger project than expected. I eventually managed, and the outcome is: I used to call: List cacheNames = new ArrayList<>(); ignite.cacheNames().forEach( n -> { if (!n.equals("settingsCache")) { ignite.cache(n).localEntries(CachePeekMode.ALL).iterator().forEachRemaining(a -> cacheNames.add(a.getKey().toString())); } } ); to check the local caches, which apparently creates a local copy of the cache in the machine (!?). Now, I replaced it with: List cacheNames = new ArrayList<>(); UUID localId = ignite.cluster().localNode().id(); ignite.cacheNames().forEach( cache -> { if (!cache.equals("settingsCache")) { boolean containsCache = ignite.cluster().forCacheNodes(cache).nodes().stream() .anyMatch(n -> n.id().equals(localId)); if (containsCache) { cacheNames.add(cache); } } } ); And the issue disapeared. Is this an intended behaviour? Because it looks weird to me. To reply to: "I think, it’s better not to set it, because otherwise if you don’t trigger the rebalance, then only one node will store the cache." With the configuration I posted you, the cache is spread out to the machines that I use in the setNodeFilter(). Yes, I believe you're correct with the NodeFilter. It should be pointless to have now, right? That was me experimenting and trying to figure out why was the cache spreading to new nodes. fetchNodes() fetches the ids of the local node and the k most empty nodes ( where k is given as an input for each cache). I check how full a node is based on the code right above, in which I check how many caches a node has. Yes, I read that I should have set the attributes. However, now it feels like an unnecessary step? What would that improve, in my case? And yes, it makes sense now! Thanks for the clarification. I thought that the rebalancing was rebalancing something in an uncontrolled way, but turns out everything was due to my ignite.cache(n).localEntries(CachePeekMode.ALL) creating a local cache. I have just one question: you called it "backup filter". Is the nodeFilter a filter for only backup nodes or was that a typo? I thought it was a filter for all the nodes for a cache. On Wed, 14 Aug 2019 at 17:58, Denis Mekhanikov wrote: > Marco, > > Rebalance mode set to NONE means that your cache won’t be rebalanced at > all unless you trigger it manually. > I think, it’s better not to set it, because otherwise if you don’t trigger > the rebalance, then only one node will store the cache. > > Also the backup filter specified in the affinity function doesn’t seem > correct to me. It’s always true, since your node filter accepts only those > nodes, that are in the nodesForOptimization list. > > What does fetchNodes() method do? > The recommended way to implement node filters is to check custom node’s > attributes using an AttributeNodeFilter > <https://static.javadoc.io/org.apache.ignite/ignite-core/2.7.5/org/apache/ignite/util/AttributeNodeFilter.html> > . > > Partition map exchange is a process that happens after every topology > change. Nodes exchange information about partitions distribution of caches. > So, you can’t prevent it from happening. > The message, that you see is a symptom and not a cause. > > Denis > > > On 13 Aug 2019, at 09:50, Marco Bernagozzi > wrote: > > Hi, I did some more digging and discovered that the issue seems to be: > > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture: > Completed partition exchange > > Is there any way to disable or limit the partition exchange? > > Best, > Marco > > On Mon, 12 Aug 2019 at 16:59, Andrei Aleksandrov > wrote: > Hi, > > Could you share the whole reproducer with all configurations and required > methods? > > BR, > Andrei > > 8/12/2019 4:48 PM, Marco Bernagozzi пишет: > > I have a set of nodes, and I want to be able to set a cache in specific > nodes. It works, but whenever I turn on a new node the cache is > automatically spread to that node, which then causes errors like: > Failed over job to a new node ( I guess that there was a computation going > on in a node that shouldn't have computed that, and was shut down in the > meantime). > > I don't know if I'm doing something wrong here or I'm missing something. > As I understand it, NodeFilter and Affinity are equivalent in my case > (Affinity is a node filter which also creates rules on where can the cache > spread from a given node?). With rebalance mo
Re: Cache spreading to new nodes
Hi, I did some more digging and discovered that the issue seems to be: org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture: Completed partition exchange Is there any way to disable or limit the partition exchange? Best, Marco On Mon, 12 Aug 2019 at 16:59, Andrei Aleksandrov wrote: > Hi, > > Could you share the whole reproducer with all configurations and required > methods? > > BR, > Andrei > 8/12/2019 4:48 PM, Marco Bernagozzi пишет: > > I have a set of nodes, and I want to be able to set a cache in specific > nodes. It works, but whenever I turn on a new node the cache is > automatically spread to that node, which then causes errors like: > Failed over job to a new node ( I guess that there was a computation going > on in a node that shouldn't have computed that, and was shut down in the > meantime). > > I don't know if I'm doing something wrong here or I'm missing something. > As I understand it, NodeFilter and Affinity are equivalent in my case > (Affinity is a node filter which also creates rules on where can the cache > spread from a given node?). With rebalance mode set to NONE, shouldn't the > cache be spread on the "nodesForOptimization" nodes, according to either > the node filter or the affinityFunction? > > Here's my code: > > List nodesForOptimization = fetchNodes(); > > CacheConfiguration graphCfg = new > CacheConfiguration<>(graphCacheName); > graphCfg = graphCfg.setCacheMode(CacheMode.REPLICATED) > .setBackups(nodesForOptimization.size() - 1) > .setAtomicityMode(CacheAtomicityMode.ATOMIC) > .setRebalanceMode(CacheRebalanceMode.NONE) > .setStoreKeepBinary(true) > .setCopyOnRead(false) > .setOnheapCacheEnabled(false) > .setNodeFilter(u -> nodesForOptimization.contains(u.id())) > .setAffinity( > new RendezvousAffinityFunction( > 1024, > (c1, c2) -> nodesForOptimization.contains(c1.id()) && > nodesForOptimization.contains(c2.id()) > ) > ) > > .setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); > >
Cache spreading to new nodes
I have a set of nodes, and I want to be able to set a cache in specific nodes. It works, but whenever I turn on a new node the cache is automatically spread to that node, which then causes errors like: Failed over job to a new node ( I guess that there was a computation going on in a node that shouldn't have computed that, and was shut down in the meantime). I don't know if I'm doing something wrong here or I'm missing something. As I understand it, NodeFilter and Affinity are equivalent in my case (Affinity is a node filter which also creates rules on where can the cache spread from a given node?). With rebalance mode set to NONE, shouldn't the cache be spread on the "nodesForOptimization" nodes, according to either the node filter or the affinityFunction? Here's my code: List nodesForOptimization = fetchNodes(); CacheConfiguration graphCfg = new CacheConfiguration<>(graphCacheName); graphCfg = graphCfg.setCacheMode(CacheMode.REPLICATED) .setBackups(nodesForOptimization.size() - 1) .setAtomicityMode(CacheAtomicityMode.ATOMIC) .setRebalanceMode(CacheRebalanceMode.NONE) .setStoreKeepBinary(true) .setCopyOnRead(false) .setOnheapCacheEnabled(false) .setNodeFilter(u -> nodesForOptimization.contains(u.id())) .setAffinity( new RendezvousAffinityFunction( 1024, (c1, c2) -> nodesForOptimization.contains(c1.id()) && nodesForOptimization.contains(c2.id()) ) ) .setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);