Re: Failed to clean IP finder up.

2019-10-02 Thread Marco Bernagozzi
Cool, thanks!
Any idea when 2.8 is planned to be released?

Regards,
Marco

On Wed, 2 Oct 2019 at 11:31, Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> Looks like it missed being part of the 2.7.x release by a month or two. It
> will be resolved when 2.8.0 comes out.
>
> Regards,
> Stephen
>
> On 2 Oct 2019, at 09:59, Marco Bernagozzi 
> wrote:
>
> I'm getting this error when the nodes are shutting down.
> What are the possible causes for this?
> A bug was marked for this a year ago or so, but was marked as resolved, it
> seems?
>
> http://apache-ignite-issues.70530.x6.nabble.com/jira-Updated-IGNITE-9826-Ignite-node-with-TcpDiscoveryS3IpFinder-can-hang-while-stopping-td75612.html
>
>
> Ignite cache is empty. Shutting down...
> 2019-10-02 08:44:37 [main] INFO
> org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol.info
> <http://org.apache.ignite.internal.processors.rest.protocols.tcp.gridtcprestprotocol.info/>(117)
> - Command protocol successfully stopped: TCP binary
> 2019-10-02 08:44:37 [main] INFO
> org.eclipse.jetty.server.AbstractConnector.doStop(332) - Stopped
> ServerConnector@70101687{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
> 2019-10-02 08:44:37 [main] INFO
> org.apache.ignite.internal.processors.rest.protocols.http.jetty.GridJettyRestProtocol.info
> <http://org.apache.ignite.internal.processors.rest.protocols.http.jetty.gridjettyrestprotocol.info/>(117)
> - Command protocol successfully stopped: Jetty REST
> 2019-10-02 08:44:37 [grid-timeout-worker-#79] INFO
> org.apache.ignite.internal.IgniteKernal.info
> <http://org.apache.ignite.internal.ignitekernal.info/>(117) -
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=c51b8462, uptime=00:15:00.073]
> ^-- H/N/C [hosts=2, nodes=2, CPUs=72]
> ^-- CPU [cur=-100%, avg=-99.73%, GC=0%]
> ^-- PageMemory [pages=1226]
> ^-- Heap [used=112MB, free=99.59%, comm=516MB]
> ^-- Off-heap [used=4MB, free=99.97%, comm=336MB]
> ^-- sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
> ^-- default region [used=4MB, free=99.97%, comm=256MB]
> ^-- TxLog region [used=0MB, free=100%, comm=40MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=6, qSize=0]
> 2019-10-02 08:44:37 [tcp-disco-sock-reader-#7] INFO
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info
> <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) -
> Finished serving remote node connection [rmtAddr=/10.0.11.180:49151,
> rmtPort=49151
> 2019-10-02 08:44:37 [tcp-disco-sock-reader-#4] INFO
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info
> <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) -
> Finished serving remote node connection [rmtAddr=/10.0.31.134:42413,
> rmtPort=42413
> 2019-10-02 08:44:37 [tcp-disco-sock-reader-#6] INFO
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info
> <http://org.apache.ignite.spi.discovery.tcp.tcpdiscoveryspi.info/>(117) -
> Finished serving remote node connection [rmtAddr=/10.0.21.167:60763,
> rmtPort=60763
> 2019-10-02 08:44:37 [tcp-disco-ip-finder-cleaner-#5] ERROR
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.error(137) - Failed to
> clean IP finder up.
> org.apache.ignite.spi.IgniteSpiException: Failed to list objects in the
> bucket: ignite-configurations-production
> at
> org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192)
> at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML
> document destined for handler class
> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
> at
> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214)
> at
> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298)
> at
> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
> at
> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
> at
> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)

Failed to clean IP finder up.

2019-10-02 Thread Marco Bernagozzi
I'm getting this error when the nodes are shutting down.
What are the possible causes for this?
A bug was marked for this a year ago or so, but was marked as resolved, it
seems?
http://apache-ignite-issues.70530.x6.nabble.com/jira-Updated-IGNITE-9826-Ignite-node-with-TcpDiscoveryS3IpFinder-can-hang-while-stopping-td75612.html


Ignite cache is empty. Shutting down...
2019-10-02 08:44:37 [main] INFO
org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol.info(117)
- Command protocol successfully stopped: TCP binary
2019-10-02 08:44:37 [main] INFO
org.eclipse.jetty.server.AbstractConnector.doStop(332) - Stopped
ServerConnector@70101687{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
2019-10-02 08:44:37 [main] INFO
org.apache.ignite.internal.processors.rest.protocols.http.jetty.GridJettyRestProtocol.info(117)
- Command protocol successfully stopped: Jetty REST
2019-10-02 08:44:37 [grid-timeout-worker-#79] INFO
org.apache.ignite.internal.IgniteKernal.info(117) -
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=c51b8462, uptime=00:15:00.073]
^-- H/N/C [hosts=2, nodes=2, CPUs=72]
^-- CPU [cur=-100%, avg=-99.73%, GC=0%]
^-- PageMemory [pages=1226]
^-- Heap [used=112MB, free=99.59%, comm=516MB]
^-- Off-heap [used=4MB, free=99.97%, comm=336MB]
^-- sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
^-- default region [used=4MB, free=99.97%, comm=256MB]
^-- TxLog region [used=0MB, free=100%, comm=40MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
2019-10-02 08:44:37 [tcp-disco-sock-reader-#7] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished
serving remote node connection [rmtAddr=/10.0.11.180:49151, rmtPort=49151
2019-10-02 08:44:37 [tcp-disco-sock-reader-#4] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished
serving remote node connection [rmtAddr=/10.0.31.134:42413, rmtPort=42413
2019-10-02 08:44:37 [tcp-disco-sock-reader-#6] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.info(117) - Finished
serving remote node connection [rmtAddr=/10.0.21.167:60763, rmtPort=60763
2019-10-02 08:44:37 [tcp-disco-ip-finder-cleaner-#5] ERROR
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.error(137) - Failed to
clean IP finder up.
org.apache.ignite.spi.IgniteSpiException: Failed to list objects in the
bucket: ignite-configurations-production
at
org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML
document destined for handler class
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214)
at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298)
at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079)
at
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819)
at

Re: Failed to read magic header (too few bytes received)

2019-09-25 Thread Marco Bernagozzi
Update 2:

Digging more in the logging, the issue seems to be:

[tcp-disco-ip-finder-cleaner-#5] ERROR
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to clean IP
finder up.
class org.apache.ignite.spi.IgniteSpiException: Failed to list objects in
the bucket: ignite-configurations-production
at
org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:192)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1900)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1998)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1973)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: com.amazonaws.SdkClientException: Failed to sanitize XML
document destined for handler class
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214)
at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298)
at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079)
at
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819)
at
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:791)
at
org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder.getRegisteredAddresses(TcpDiscoveryS3IpFinder.java:146)
... 4 more
Caused by: com.amazonaws.AbortedException:
at
com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at
com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
at java.base/java.io.BufferedReader.read1(BufferedReader.java:210)
at java.base/java.io.BufferedReader.read(BufferedReader.java:287)
at java.base/java.io.Reader.read(Reader.java:229)
at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:186)
... 24 more
[tcp-disco-msg-worker-#2] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node
[newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937,
addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1],
sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, /
127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, /
10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500],
discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381,
loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]

On Tue, 24 Sep 2019 at 15:59, Marco Bernagozzi 
wrote:

> It was on aws, and I don't have the IP log of all the instances I had up.
> My best guess it's that it was just one of the slave instances.
> I have two sets of machines, masters and slaves. They are all servers. The
> masters create caches and distribute caches to a set of slaves using a node
> filter.
> Here are the options I'm using to run it
>
> CMD ["java", &

Re: Failed to read magic header (too few bytes received)

2019-09-24 Thread Marco Bernagozzi
It was on aws, and I don't have the IP log of all the instances I had up.
My best guess it's that it was just one of the slave instances.
I have two sets of machines, masters and slaves. They are all servers. The
masters create caches and distribute caches to a set of slaves using a node
filter.
Here are the options I'm using to run it

CMD ["java", "-jar", "-XX:+AlwaysPreTouch", "-XX:+UseG1GC",
"-XX:+DisableExplicitGC", "-XX:+ScavengeBeforeFullGC",
"--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED",
"--add-exports=java.base/sun.nio.ch=ALL-UNNAMED",
"--add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED",
"--add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED",
"--add-exports=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED","--illegal-access=permit",
"-Djdk.tls.client.protocols=TLSv1.2", "-Djava.net.preferIPv4Stack=true",
"-DIGNITE_QUIET=false", "algotworker.jar", "server", "config.yml"]

In my logs, "10.0.11.210" appears first in the slave nodes:
[main] INFO org.apache.ignite.internal.IgniteKernal - Non-loopback local
IPs: 10.0.11.210, 169.254.1.1, 172.17.0.1, 172.18.0.1
[tcp-disco-msg-worker-#2] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node
[newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937,
addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1],
sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, /
127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, /
10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500],
discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381,
loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
[tcp-disco-msg-worker-#2] WARN
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node has
connection to it's previous, trying previous again. [next=TcpDiscoveryNode
[id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1,
172.17.0.1, 172.18.0.1],
sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, /
127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, /
10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500],
discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381,
loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
[tcp-disco-msg-worker-#2] INFO
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node
[newNext=TcpDiscoveryNode [id=bc657c40-27dd-4190-af04-e53068176937,
addrs=[10.0.11.210, 127.0.0.1, 172.17.0.1, 172.18.0.1],
sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, /
127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, /
10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500],
discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381,
loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
[tcp-disco-msg-worker-#2] WARN
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - New next node has
connection to it's previous, trying previous again. [next=TcpDiscoveryNode
[id=bc657c40-27dd-4190-af04-e53068176937, addrs=[10.0.11.210, 127.0.0.1,
172.17.0.1, 172.18.0.1],
sockAddrs=[ip-172-17-0-1.eu-west-1.compute.internal/172.17.0.1:47500, /
127.0.0.1:47500, production-algo-spot-instance-ASG/10.0.31.153:47500, /
10.0.11.210:47500, ip-172-18-0-1.eu-west-1.compute.internal/172.18.0.1:47500],
discPort=47500, order=8, intOrder=7, lastExchangeTime=1569318655381,
loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]

I didn't see this before. What does this mean? I guess this is the issue,
right?


On Tue, 24 Sep 2019 at 15:16, Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> What’s on IP address 10.0.11.210? It’s sending Ignite something that it
> doesn’t understand. Maybe it’s not another copy of Ignite? Could it be a
> firewall setting truncating the message? Or perhaps the remote node has a
> different configuration, for example mixing up communication and discovery
> ports?
>
> Regards,
> Stephen
>
> On 24 Sep 2019, at 13:00, Marco Bernagozzi 
> wrote:
>
> Hi.
> I get this error sometimes, it seems to be quite random. Any idea what
> this might be caused by?
> Since every time it's thrown a hundred times or so, I had to temporarily
> suppress all the errors from that class. Is there a way to fix this or at
> least to make it be thrown just once?
>
> My settings:
>
> AwsConfiguration awsConfiguration =
> AwsConfigurationSingleton.getInstance();
> BasicAWSCredentials creds = new BasicAWSCredentials(
> awsConfiguration.getAwsAccessKey(),
> awsConfiguration.getAwsSecretKey()
> );
> TcpDiscoveryS3IpFinder ipFinder = new TcpDiscoveryS3

Failed to read magic header (too few bytes received)

2019-09-24 Thread Marco Bernagozzi
Hi.
I get this error sometimes, it seems to be quite random. Any idea what this
might be caused by?
Since every time it's thrown a hundred times or so, I had to temporarily
suppress all the errors from that class. Is there a way to fix this or at
least to make it be thrown just once?

My settings:

AwsConfiguration awsConfiguration = AwsConfigurationSingleton.getInstance();
BasicAWSCredentials creds = new BasicAWSCredentials(
awsConfiguration.getAwsAccessKey(),
awsConfiguration.getAwsSecretKey()
);
TcpDiscoveryS3IpFinder ipFinder = new TcpDiscoveryS3IpFinder();
ipFinder.setAwsCredentials(creds);
ipFinder.setBucketEndpoint("s3.eu-west-1.amazonaws.com");
ipFinder.setBucketName(awsConfiguration.getIgniteBucket());

TcpDiscoverySpi spi = new TcpDiscoverySpi();
spi.setIpFinder(ipFinder);
IgniteConfiguration cfg = new IgniteConfiguration();
DataStorageConfiguration storageCfg = new DataStorageConfiguration();
storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(false);
cfg.setDataStorageConfiguration(storageCfg);
IgniteLogger log = new Slf4jLogger();
cfg.setGridLogger(log);
TcpCommunicationSpi commSpi = new TcpCommunicationSpi();
commSpi.setMessageQueueLimit(1);
cfg.setCommunicationSpi(commSpi);
cfg.setDiscoverySpi(spi);
cfg.setBinaryConfiguration(new BinaryConfiguration());
cfg.getBinaryConfiguration().setCompactFooter(false);
cfg.setFailureDetectionTimeout(60 * 1000);

The log:

INFO [2019-09-24 10:04:21,402] org.apache.ignite.internal.IgniteKernal:
>>> __ 
>>> / _/ ___/ |/ / _/_ __/ __/
>>> _/ // (7 7 // / / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/
>>>
>>> ver. 2.7.5#20190603-sha1:be4f2a15
>>> 2018 Copyright(C) Apache Software Foundation
>>>
>>> Ignite documentation: http://ignite.apache.org
INFO [2019-09-24 10:04:21,402] org.apache.ignite.internal.IgniteKernal:
Config URL: n/a
INFO [2019-09-24 10:04:21,413] org.apache.ignite.internal.IgniteKernal:
IgniteConfiguration [igniteInstanceName=null, pubPoolSize=36,
svcPoolSize=36, callbackPoolSize=36, stripedPoolSize=36, sysPoolSize=36,
mgmtPoolSize=4, igfsPoolSize=36, dataStreamerPoolSize=36,
utilityCachePoolSize=36, utilityCacheKeepAliveTime=6, p2pPoolSize=2,
qryPoolSize=36, igniteHome=/opt/ignite/apache-ignite-2.7.5-bin,
igniteWorkDir=/opt/ignite/apache-ignite-2.7.5-bin/work,
mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@47c81abf,
nodeId=09156c05-f869-4826-814e-1f47fcefeeb4, marsh=BinaryMarshaller [],
marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000,
sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60,
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true,
allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi
[connectGate=null,
connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@104d9de4,
enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null,
locHost=null, locPort=47100, locPortRange=100, shmemPort=-1,
directBuf=true, directSndBuf=false, idleConnTimeout=60,
connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768,
sockRcvBuf=32768, msgQueueLimit=1, slowClientQueueLimit=0,
nioSrvr=null, shmemSrv=null, usePairedConnections=false,
connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false,
ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000,
boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=18, selectorSpins=0,
addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@2b43b6cc[Count
= 1], stopping=false],
evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@3221402f,
colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [],
indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@415fad70,
addrRslvr=null,
encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@7947ad4c,
clientMode=false, rebalanceThreadPoolSize=1, txCfg=TransactionConfiguration
[txSerEnabled=false, dfltIsolation=REPEATABLE_READ,
dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0,
txTimeoutOnPartitionMapExchange=0, pessimisticTxLogSize=0,
pessimisticTxLogLinger=1, tmLookupClsName=null, txManagerFactory=null,
useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=6,
deployMode=SHARED, p2pMissedCacheSize=100, locHost=null,
timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=6,
sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=3,
metricsLogFreq=6, hadoopCfg=null, connectorCfg=ConnectorConfiguration
[jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false,
sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=60,
idleQryCurCheckFreq=6, sndQueueLimit=0, selectorCnt=4,
idleTimeout=7000, sslEnabled=false, sslClientAuth=false,

Re: Cache spreading to new nodes

2019-08-15 Thread Marco Bernagozzi
Hi,
Sorry, tearing down the project to make a runnable proved to be a much
bigger project than expected. I eventually managed, and the outcome is:
I used to call:
List cacheNames = new ArrayList<>();
ignite.cacheNames().forEach(
n -> {
if (!n.equals("settingsCache")) {

ignite.cache(n).localEntries(CachePeekMode.ALL).iterator().forEachRemaining(a
-> cacheNames.add(a.getKey().toString()));
}
}
);
to check the local caches, which apparently creates a local copy of the
cache in the machine (!?).
Now, I replaced it with:
List cacheNames = new ArrayList<>();
UUID localId = ignite.cluster().localNode().id();
ignite.cacheNames().forEach(
cache -> {
if (!cache.equals("settingsCache")) {
boolean containsCache =
ignite.cluster().forCacheNodes(cache).nodes().stream()
.anyMatch(n -> n.id().equals(localId));
if (containsCache) {
cacheNames.add(cache);
}
}
}
);

And the issue disapeared. Is this an intended behaviour? Because it looks
weird to me.

To reply to:
"I think, it’s better not to set it, because otherwise if you don’t trigger
the rebalance, then only one node will store the cache."
With the configuration I posted you, the cache is spread out to the
machines that I use in the setNodeFilter().

 Yes, I believe you're correct with the NodeFilter. It should be pointless
to have now, right? That was me experimenting and trying to figure out why
was the cache spreading to new nodes.

fetchNodes() fetches the ids of the local node and the k most empty nodes (
where k is given as an input for each cache). I check how full a node is
based on the code right above, in which I check how many caches a node has.

Yes, I read that I should have set the attributes. However, now it feels
like an unnecessary step? What would that improve, in my case?

 And yes, it makes sense now! Thanks for the clarification. I thought that
the rebalancing was rebalancing something in an uncontrolled way, but turns
out everything was due to my
ignite.cache(n).localEntries(CachePeekMode.ALL) creating a local cache.

I have just one question: you called it "backup filter". Is the nodeFilter
a filter for only backup nodes or was that a typo? I thought it was a
filter for all the nodes for a cache.

On Wed, 14 Aug 2019 at 17:58, Denis Mekhanikov 
wrote:

> Marco,
>
> Rebalance mode set to NONE means that your cache won’t be rebalanced at
> all unless you trigger it manually.
> I think, it’s better not to set it, because otherwise if you don’t trigger
> the rebalance, then only one node will store the cache.
>
> Also the backup filter specified in the affinity function doesn’t seem
> correct to me. It’s always true, since your node filter accepts only those
> nodes, that are in the nodesForOptimization list.
>
> What does fetchNodes() method do?
> The recommended way to implement node filters is to check custom node’s
> attributes using an AttributeNodeFilter
> <https://static.javadoc.io/org.apache.ignite/ignite-core/2.7.5/org/apache/ignite/util/AttributeNodeFilter.html>
> .
>
> Partition map exchange is a process that happens after every topology
> change. Nodes exchange information about partitions distribution of caches.
> So, you can’t prevent it from happening.
> The message, that you see is a symptom and not a cause.
>
> Denis
>
>
> On 13 Aug 2019, at 09:50, Marco Bernagozzi 
> wrote:
>
> Hi, I did some more digging and discovered that the issue seems to be:
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:
> Completed partition exchange
>
> Is there any way to disable or limit the partition exchange?
>
> Best,
> Marco
>
> On Mon, 12 Aug 2019 at 16:59, Andrei Aleksandrov 
> wrote:
> Hi,
>
> Could you share the whole reproducer with all configurations and required
> methods?
>
> BR,
> Andrei
>
> 8/12/2019 4:48 PM, Marco Bernagozzi пишет:
>
> I have a set of nodes, and I want to be able to set a cache in specific
> nodes. It works, but whenever I turn on a new node the cache is
> automatically spread to that node, which then causes errors like:
> Failed over job to a new node ( I guess that there was a computation going
> on in a node that shouldn't have computed that, and was shut down in the
> meantime).
>
> I don't know if I'm doing something wrong here or I'm missing something.
> As I understand it, NodeFilter and Affinity are equivalent in my case
> (Affinity is a node filter which also creates rules on where can the cache
> spread from a given node?). With rebalance mo

Re: Cache spreading to new nodes

2019-08-13 Thread Marco Bernagozzi
Hi, I did some more digging and discovered that the issue seems to be:

org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:
Completed partition exchange

Is there any way to disable or limit the partition exchange?

Best,
Marco

On Mon, 12 Aug 2019 at 16:59, Andrei Aleksandrov 
wrote:

> Hi,
>
> Could you share the whole reproducer with all configurations and required
> methods?
>
> BR,
> Andrei
> 8/12/2019 4:48 PM, Marco Bernagozzi пишет:
>
> I have a set of nodes, and I want to be able to set a cache in specific
> nodes. It works, but whenever I turn on a new node the cache is
> automatically spread to that node, which then causes errors like:
> Failed over job to a new node ( I guess that there was a computation going
> on in a node that shouldn't have computed that, and was shut down in the
> meantime).
>
> I don't know if I'm doing something wrong here or I'm missing something.
> As I understand it, NodeFilter and Affinity are equivalent in my case
> (Affinity is a node filter which also creates rules on where can the cache
> spread from a given node?). With rebalance mode set to NONE, shouldn't the
> cache be spread on the "nodesForOptimization" nodes, according to either
> the node filter or the affinityFunction?
>
> Here's my code:
>
> List nodesForOptimization = fetchNodes();
>
> CacheConfiguration graphCfg = new
> CacheConfiguration<>(graphCacheName);
> graphCfg = graphCfg.setCacheMode(CacheMode.REPLICATED)
> .setBackups(nodesForOptimization.size() - 1)
> .setAtomicityMode(CacheAtomicityMode.ATOMIC)
> .setRebalanceMode(CacheRebalanceMode.NONE)
> .setStoreKeepBinary(true)
> .setCopyOnRead(false)
> .setOnheapCacheEnabled(false)
> .setNodeFilter(u -> nodesForOptimization.contains(u.id()))
> .setAffinity(
> new RendezvousAffinityFunction(
> 1024,
> (c1, c2) -> nodesForOptimization.contains(c1.id()) &&
> nodesForOptimization.contains(c2.id())
> )
> )
>
> .setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
>
>


Cache spreading to new nodes

2019-08-12 Thread Marco Bernagozzi
I have a set of nodes, and I want to be able to set a cache in specific
nodes. It works, but whenever I turn on a new node the cache is
automatically spread to that node, which then causes errors like:
Failed over job to a new node ( I guess that there was a computation going
on in a node that shouldn't have computed that, and was shut down in the
meantime).

I don't know if I'm doing something wrong here or I'm missing something.
As I understand it, NodeFilter and Affinity are equivalent in my case
(Affinity is a node filter which also creates rules on where can the cache
spread from a given node?). With rebalance mode set to NONE, shouldn't the
cache be spread on the "nodesForOptimization" nodes, according to either
the node filter or the affinityFunction?

Here's my code:

List nodesForOptimization = fetchNodes();

CacheConfiguration graphCfg = new
CacheConfiguration<>(graphCacheName);
graphCfg = graphCfg.setCacheMode(CacheMode.REPLICATED)
.setBackups(nodesForOptimization.size() - 1)
.setAtomicityMode(CacheAtomicityMode.ATOMIC)
.setRebalanceMode(CacheRebalanceMode.NONE)
.setStoreKeepBinary(true)
.setCopyOnRead(false)
.setOnheapCacheEnabled(false)
.setNodeFilter(u -> nodesForOptimization.contains(u.id()))
.setAffinity(
new RendezvousAffinityFunction(
1024,
(c1, c2) -> nodesForOptimization.contains(c1.id()) &&
nodesForOptimization.contains(c2.id())
)
)

.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);