Re: Ignite client hangs forever when performing cache operation with 6 servers, works with 3 servers

2021-06-23 Thread Ilya Kasnacheev
Hello!

Unfortunately, these links are dead (404).

If it's still relevant, please consider re-uploading.

Regards,
-- 
Ilya Kasnacheev


пн, 14 июн. 2021 г. в 21:59, mapeters :

> Problem: Ignite client hangs forever when performing a cache operation. We
> have 6 ignite servers running, the problem goes away when reducing this to
> 3. What effect does expanding/reducing the server cluster have that could
> cause this?
>
> See attached for sample stack trace of hanging client thread, server config
> snippet, client config snippet, and cache key snippet. From looking through
> the logs, there essentially seem to be various TCP communication errors
> such
> as the attached client and server errors. We tried increasing the (client)
> failure detection timeout values as suggested by the server error message,
> but that just made system startup hang for a long time (close to an hour).
>
> Usage:
>
> We have large number data objects (64k-400M) stored within HDF5 files and
> process hundreds of millions of records a day, with total data throughput
> ranging from 500GB - 10TB of data a day. We utilize ignite as an in memory
> distributed cache in front of the process that interacts with the HDF5
> files.
>
> Configuration:
>
> 1. Ignite version is 2.9.
> 2. The configuration is a 6 node ignite cluster using a partitioned cache.
> 3. Ignite’s persistence is disabled and we wrote a cache store
> implementation to persist the cache entries to the backing hdf5 files.
> 4. Ignite is configured in a write behind / read through manner.
> 5. There are four primary caches split up by data type to reduce amount of
> traffic on any one cache. The caches are all configured the same except for
> write behind properties and the data types within each cache to help manage
> how much data is in a specific cache.
> 6. The cache key is a compound object of path to the file and then a group
> /
> locator string within the file.
>
> Hardware:
>
> 1. In our failure site, there are 6 physical systems running Red Hat
> Hyperconverged Infrastructure.
> 2. Each physical node had a pinned VM running apache ignite. The VM has
> 128GB of memory. Ignite is configured with 16GB of heap memory, and 64GB of
> off heap cache.
> 3. There are 6 other VMs, each running 3 processes that all store to
> ignite.
> 4. There is a single VM that fronts the HDF5 files that Ignite talks to for
> persistent storage.
>
> hangingStackTrace.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/hangingStackTrace.txt>
>
> serverConfig.xml
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/serverConfig.xml>
>
> clientConfig.xml
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/clientConfig.xml>
>
> DataStoreKey.java
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/DataStoreKey.java>
>
>
> serverErrors.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/serverErrors.txt>
>
> clientErrors.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t3178/clientErrors.txt>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Ignite client hangs forever when performing cache operation with 6 servers, works with 3 servers

2021-06-14 Thread mapeters
Problem: Ignite client hangs forever when performing a cache operation. We
have 6 ignite servers running, the problem goes away when reducing this to
3. What effect does expanding/reducing the server cluster have that could
cause this?

See attached for sample stack trace of hanging client thread, server config
snippet, client config snippet, and cache key snippet. From looking through
the logs, there essentially seem to be various TCP communication errors such
as the attached client and server errors. We tried increasing the (client)
failure detection timeout values as suggested by the server error message,
but that just made system startup hang for a long time (close to an hour).

Usage:

We have large number data objects (64k-400M) stored within HDF5 files and
process hundreds of millions of records a day, with total data throughput
ranging from 500GB - 10TB of data a day. We utilize ignite as an in memory
distributed cache in front of the process that interacts with the HDF5
files.

Configuration:

1. Ignite version is 2.9.
2. The configuration is a 6 node ignite cluster using a partitioned cache.
3. Ignite’s persistence is disabled and we wrote a cache store
implementation to persist the cache entries to the backing hdf5 files.
4. Ignite is configured in a write behind / read through manner.
5. There are four primary caches split up by data type to reduce amount of
traffic on any one cache. The caches are all configured the same except for
write behind properties and the data types within each cache to help manage
how much data is in a specific cache.
6. The cache key is a compound object of path to the file and then a group /
locator string within the file.

Hardware:

1. In our failure site, there are 6 physical systems running Red Hat
Hyperconverged Infrastructure.
2. Each physical node had a pinned VM running apache ignite. The VM has
128GB of memory. Ignite is configured with 16GB of heap memory, and 64GB of
off heap cache.
3. There are 6 other VMs, each running 3 processes that all store to ignite.
4. There is a single VM that fronts the HDF5 files that Ignite talks to for
persistent storage.

hangingStackTrace.txt

  
serverConfig.xml
  
clientConfig.xml
  
DataStoreKey.java
  

serverErrors.txt
  
clientErrors.txt
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/