[jira] [Commented] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-25 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541884#comment-17541884
 ] 

Juan Ramos commented on GEODE-10312:


{quote}If changing the URL is permissible, then I think this is done, if not, I 
will work on finding a way to configure it correctly.
{quote}
I don't think this is permissible as it implies breaking backward 
compatibility, doesn't it?.

> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Assignee: Patrick Johnsn
>Priority: Major
>  Labels: blocks-1.15.0, pull-request-available
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> ---
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
>   at 
> org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
>   at 
> org.springframework.beans.factory.support.ConstructorR

[jira] [Commented] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-17 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538218#comment-17538218
 ] 

Juan Ramos commented on GEODE-10312:


Another side effect of the change is that the automatically generated 
{{swagger}} docs are not "sorted" anymore. Not a deal breaker, sure, but it 
might cause headaches for customers interacting with the REST API using clients 
automatically generated (through the {{swagger-codegen-cli}} as an example). 
I've done some local tests and adding the configuration property 
[writer-with-order-by-keys|https://springdoc.org/index.html#springdoc-openapi-core-properties],
 both to 
[geode-web-api/src/main/resources/swagger.properties|https://github.com/apache/geode/blob/develop/geode-web-api/src/main/resources/swagger.properties]
 and 
[geode-web-management/src/main/resources/swagger-management.properties|https://github.com/apache/geode/blob/develop/geode-web-management/src/main/resources/swagger-management.properties],
 seems to fix this.


> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: blocks-1.15.0
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> ---
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' 

[jira] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-17 Thread Juan Ramos (Jira)


[ https://issues.apache.org/jira/browse/GEODE-10312 ]


Juan Ramos deleted comment on GEODE-10312:


was (Author: jujoramos):
Another side effect of the change is that the automatically generated 
{{swagger}} docs are not "sorted" anymore. Not a deal breaker, sure, but it 
might cause headaches for customers interacting with the REST API using clients 
automatically generated (through the {{swagger-codegen-cli}} as an example). 
I've done some local tests and adding the configuration property 
[writer-with-order-by-keys|https://springdoc.org/index.html#springdoc-openapi-core-properties],
 both to 
[geode-web-api/src/main/resources/swagger.properties|https://github.com/apache/geode/blob/develop/geode-web-api/src/main/resources/swagger.properties]
 and 
[geode-web-management/src/main/resources/swagger-management.properties|https://github.com/apache/geode/blob/develop/geode-web-management/src/main/resources/swagger-management.properties],
 seems to fix this.


> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: blocks-1.15.0
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> ---
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCr

[jira] [Commented] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-17 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538208#comment-17538208
 ] 

Juan Ramos commented on GEODE-10312:


Another side effect of the change is that the automatically generated 
{{swagger}} docs are not "sorted" anymore. Not a deal breaker, sure, but it 
might cause headaches for customers interacting with the REST API using clients 
automatically generated (through the {{swagger-codegen-cli}} as an example). 
I've done some local tests and adding the configuration property 
[writer-with-order-by-keys|https://springdoc.org/index.html#springdoc-openapi-core-properties],
 both to 
[geode-web-api/src/main/resources/swagger.properties|https://github.com/apache/geode/blob/develop/geode-web-api/src/main/resources/swagger.properties]
 and 
[geode-web-management/src/main/resources/swagger-management.properties|https://github.com/apache/geode/blob/develop/geode-web-management/src/main/resources/swagger-management.properties],
 seems to fix this.


> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: blocks-1.15.0
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> ---
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' 

[jira] [Updated] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-16 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-10312:
---
Labels: blocks-1.15.0​ needsTriage  (was: needsTriage)

> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: blocks-1.15.0​, needsTriage
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> ---
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
>   at 
> org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
>   at 
> org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:541)
>   at 
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
>   at 
> org.springframework.beans.factory.su

[jira] [Updated] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-16 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-10312:
---
Description: 
The issue was introduced by GEODE-10282. As part of commit 
[41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
 {{SwaggerConfig}} classes used to start and configure the internal 
{{geode-web-management}} and {{geode-web-api}} services use the 
{{@SpringBootApplication}} annotation. This annotation automatically enables 
other spring annotations (like {{@EnableAutoConfiguration}} and 
{{@ComponentScan}}) which, in turn, might cause critical issues during startup 
as {{spring}} tries to automatically configure several services based on 
classes and interfaces found within the member's class path.

---

I'm attaching a small scenario that reproduces the problem; the 
{{reproduce.sh}} script simply starts a locator making sure that the 
{{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
after 
[41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
 the logs will contain the following:
{noformat}
[info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
/management

[info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing Servlet 
'management'

[info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
request with 
[org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
 
org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
 org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132, 
org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
 org.springframework.security.web.session.SessionManagementFilter@78907a46, 
org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]

[warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
encountered during context initialization - cancelling refresh attempt: 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'dataSource' defined in class path resource 
[org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
 Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 
'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
 Invocation of init method failed; nested exception is 
java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException

[error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
initialization failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'dataSource' defined in class path resource 
[org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
 Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 
'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
 Invocation of init method failed; nested exception is 
java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
at 
org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
at 
org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:541)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at 
org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
at 
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at 

[jira] [Created] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-16 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-10312:
--

 Summary: Remove SpringBootApplication In SwaggerConfig
 Key: GEODE-10312
 URL: https://issues.apache.org/jira/browse/GEODE-10312
 Project: Geode
  Issue Type: Bug
  Components: locator, rest (admin), rest (dev)
Affects Versions: 1.15.0
Reporter: Juan Ramos
 Attachments: GEODE-10312.zip

The issue was introduced by GEODE-10282. As part of commit 
[41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
 {{SwaggerConfig}} classes used to start and configure the internal 
{{geode-web-management}} and {{geode-web-api}} services use the 
{{@SpringBootApplication}} annotation. This annotation automatically enables 
other spring annotations (like {{@EnableAutoConfiguration}} and 
{{@ComponentScan}}) which, in turn, might cause critical issues during startup 
as {{spring}} tries to automatically configure several services based on 
classes and interfaces found within the member's class path.
I'm attaching a small scenario that reproduces the problem; the 
{{reproduce.sh}} script simply starts a locator making sure that the 
{{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
after 
[41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
 the logs will contain the following:
{noformat}
[info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
/management

[info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing Servlet 
'management'

[info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
request with 
[org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
 
org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
 org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132, 
org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
 org.springframework.security.web.session.SessionManagementFilter@78907a46, 
org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]

[warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
encountered during context initialization - cancelling refresh attempt: 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'dataSource' defined in class path resource 
[org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
 Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 
'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
 Invocation of init method failed; nested exception is 
java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException

[error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
initialization failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'dataSource' defined in class path resource 
[org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
 Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 
'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
 Invocation of init method failed; nested exception is 
java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
at 
org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
at 
org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:541)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at 
org.springframework.be

[jira] [Updated] (GEODE-10312) Remove SpringBootApplication In SwaggerConfig

2022-05-16 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-10312:
---
Attachment: GEODE-10312.zip

> Remove SpringBootApplication In SwaggerConfig
> -
>
> Key: GEODE-10312
> URL: https://issues.apache.org/jira/browse/GEODE-10312
> Project: Geode
>  Issue Type: Bug
>  Components: locator, rest (admin), rest (dev)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: needsTriage
> Attachments: GEODE-10312.zip
>
>
> The issue was introduced by GEODE-10282. As part of commit 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4],
>  {{SwaggerConfig}} classes used to start and configure the internal 
> {{geode-web-management}} and {{geode-web-api}} services use the 
> {{@SpringBootApplication}} annotation. This annotation automatically enables 
> other spring annotations (like {{@EnableAutoConfiguration}} and 
> {{@ComponentScan}}) which, in turn, might cause critical issues during 
> startup as {{spring}} tries to automatically configure several services based 
> on classes and interfaces found within the member's class path.
> I'm attaching a small scenario that reproduces the problem; the 
> {{reproduce.sh}} script simply starts a locator making sure that the 
> {{spring-jdbc-5.3.20.jar}} is part of the class path. When using any commit 
> after 
> [41305de1405c2125142e6b337c3f1704f736fca4|https://github.com/apache/geode/commit/41305de1405c2125142e6b337c3f1704f736fca4]
>  the logs will contain the following:
> {noformat}
> [info 2022/05/16 15:54:38.997 IST locator0  tid=0x1] Adding webapp 
> /management
> [info 2022/05/16 15:54:39.610 IST locator0  tid=0x1] Initializing 
> Servlet 'management'
> [info 2022/05/16 15:54:42.124 IST locator0  tid=0x1] Will secure any 
> request with 
> [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@33ed6546,
>  
> org.springframework.security.web.context.SecurityContextPersistenceFilter@5a503cf0,
>  org.springframework.security.web.header.HeaderWriterFilter@5b04224a, 
> org.springframework.security.web.authentication.logout.LogoutFilter@17db90a7, 
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter@6f78c132,
>  
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@42f9b425,
>  
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter@54d62c35,
>  org.springframework.security.web.session.SessionManagementFilter@78907a46, 
> org.springframework.security.web.access.ExceptionTranslationFilter@eaf3dd0, 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor@7cd6b76a]
> [warn 2022/05/16 15:54:42.975 IST locator0  tid=0x1] Exception 
> encountered during context initialization - cancelling refresh attempt: 
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
> [error 2022/05/16 15:54:42.980 IST locator0  tid=0x1] Context 
> initialization failed
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
> creating bean with name 'dataSource' defined in class path resource 
> [org/springframework/boot/autoconfigure/jdbc/DataSourceConfiguration$Hikari.class]:
>  Unsatisfied dependency expressed through method 'dataSource' parameter 0; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 
> 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties':
>  Invocation of init method failed; nested exception is 
> java.lang.NoClassDefFoundError: org/springframework/dao/DataAccessException
>   at 
> org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
>   at 
> org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:541)
>   at 
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
>   at 
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBe

[jira] [Resolved] (GEODE-10230) Support for PDX Update and Delete Endpoints in Management REST API

2022-04-12 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-10230.

Fix Version/s: 1.15.0
   Resolution: Fixed

> Support for PDX Update and Delete Endpoints in Management REST API
> --
>
> Key: GEODE-10230
> URL: https://issues.apache.org/jira/browse/GEODE-10230
> Project: Geode
>  Issue Type: Bug
>  Components: management, rest (admin)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: needsTriage, pull-request-available
> Fix For: 1.15.0
>
>
> Support for PDX Update and Delete Endpoints in Management REST API
> The cluster management REST API only exports CREATE and DELETE operations for 
> all currently supported configuration elements (region, gateway, pdx, etc.). 
> Even though several of the {{ConfigurationRealizer}}, 
> {{ConfigurationManager}} and {{ConfigurationValidator}} are already 
> implemented, the {{LocatorClusterManagementService}} always throws an 
> exception for UPDATE operations and the actual endpoints don't even exist on 
> the respective controllers.
> The above greatly limits the ability of consumers to use the management REST 
> API endpoints as the configurations can't be changed after creation time, 
> making some of them useless. As an example, a user probably doesn't know 
> before hand the full list of domain classes that need to be serialized using 
> the PDX auto-serializer. When using only the management REST API endpoints to 
> manage a cluster, this implies that the PDX cluster configuration becomes 
> useless as soon as an extra pattern needs to be added, forcing the user to 
> entirely re-create and re-populate the cluster from scratch.
> This ticket only aims to support delete and update operations for the PDX 
> configuration using the management REST API, the rest of the configuration 
> elements will remain forbidden (old behaviour will be kept by leveraging the 
> respective {{ConfigurationValidator}}) and must be incrementally added in the 
> future if needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10230) Support for PDX Update and Delete Endpoints in Management REST API

2022-04-11 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-10230:
--

Assignee: Juan Ramos

> Support for PDX Update and Delete Endpoints in Management REST API
> --
>
> Key: GEODE-10230
> URL: https://issues.apache.org/jira/browse/GEODE-10230
> Project: Geode
>  Issue Type: Bug
>  Components: management, rest (admin)
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: needsTriage
>
> Support for PDX Update and Delete Endpoints in Management REST API
> The cluster management REST API only exports CREATE and DELETE operations for 
> all currently supported configuration elements (region, gateway, pdx, etc.). 
> Even though several of the {{ConfigurationRealizer}}, 
> {{ConfigurationManager}} and {{ConfigurationValidator}} are already 
> implemented, the {{LocatorClusterManagementService}} always throws an 
> exception for UPDATE operations and the actual endpoints don't even exist on 
> the respective controllers.
> The above greatly limits the ability of consumers to use the management REST 
> API endpoints as the configurations can't be changed after creation time, 
> making some of them useless. As an example, a user probably doesn't know 
> before hand the full list of domain classes that need to be serialized using 
> the PDX auto-serializer. When using only the management REST API endpoints to 
> manage a cluster, this implies that the PDX cluster configuration becomes 
> useless as soon as an extra pattern needs to be added, forcing the user to 
> entirely re-create and re-populate the cluster from scratch.
> This ticket only aims to support delete and update operations for the PDX 
> configuration using the management REST API, the rest of the configuration 
> elements will remain forbidden (old behaviour will be kept by leveraging the 
> respective {{ConfigurationValidator}}) and must be incrementally added in the 
> future if needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-10230) Support for PDX Update and Delete Endpoints in Management REST API

2022-04-11 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-10230:
--

 Summary: Support for PDX Update and Delete Endpoints in Management 
REST API
 Key: GEODE-10230
 URL: https://issues.apache.org/jira/browse/GEODE-10230
 Project: Geode
  Issue Type: Bug
  Components: management, rest (admin)
Affects Versions: 1.15.0
Reporter: Juan Ramos


Support for PDX Update and Delete Endpoints in Management REST API

The cluster management REST API only exports CREATE and DELETE operations for 
all currently supported configuration elements (region, gateway, pdx, etc.). 
Even though several of the {{ConfigurationRealizer}}, {{ConfigurationManager}} 
and {{ConfigurationValidator}} are already implemented, the 
{{LocatorClusterManagementService}} always throws an exception for UPDATE 
operations and the actual endpoints don't even exist on the respective 
controllers.
The above greatly limits the ability of consumers to use the management REST 
API endpoints as the configurations can't be changed after creation time, 
making some of them useless. As an example, a user probably doesn't know before 
hand the full list of domain classes that need to be serialized using the PDX 
auto-serializer. When using only the management REST API endpoints to manage a 
cluster, this implies that the PDX cluster configuration becomes useless as 
soon as an extra pattern needs to be added, forcing the user to entirely 
re-create and re-populate the cluster from scratch.

This ticket only aims to support delete and update operations for the PDX 
configuration using the management REST API, the rest of the configuration 
elements will remain forbidden (old behaviour will be kept by leveraging the 
respective {{ConfigurationValidator}}) and must be incrementally added in the 
future if needed.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9402) Automatic Reconnect Failure: Address already in use

2021-11-10 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441635#comment-17441635
 ] 

Juan Ramos commented on GEODE-9402:
---

[~burcham]: thanks for the update and analysis!.
That said, I'm still inclined to consider this as a bug in the product... it 
might not be a critical problem in self healing platforms (like {{bosh}} or 
{{kubernetes}}) as there's a non human entity automatically starting up the 
server after a failure (as demonstrated in the logs). For non self healing 
environments, however, users rely on the auto reconnect feature for the server 
to automatically startup and re-join the cluster after Forced Cache 
Disconnection, which doesn't happen in this case as the process throws the 
{{BindException}} and exits (requiring manual intervention).
Maybe the {{ReconnectThread}} should internally retry when getting a 
{{BindException}}?, or should the {{DisconnectThread}} wait until the port is 
unbound before proceeding to the reconnect phase instead?.

> Automatic Reconnect Failure: Address already in use
> ---
>
> Key: GEODE-9402
> URL: https://issues.apache.org/jira/browse/GEODE-9402
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Juan Ramos
>Assignee: Bill Burcham
>Priority: Major
> Attachments: cluster_logs_gke_latest_54.zip, cluster_logs_pks_121.zip
>
>
> There are 2 locators and 4 servers during the test, once they're all up and 
> running the test drops the network connectivity between all members to 
> generate a full network partition and cause all members to shutdown and go 
> into reconnect mode. Upon reaching the mentioned state, the test 
> automatically restores the network connectivity and expects all members to 
> automatically go up again and re-form the distributed system.
>  This works fine most of the time, and we see every member successfully 
> reconnecting to the distributed system:
> {noformat}
> [info 2021/06/23 15:58:12.981 GMT gemfire-cluster-locator-0  
> tid=0x87] Reconnect completed.
> [info 2021/06/23 15:58:14.726 GMT gemfire-cluster-locator-1  
> tid=0x86] Reconnect completed.
> [info 2021/06/23 15:58:46.702 GMT gemfire-cluster-server-0  
> tid=0x94] Reconnect completed.
> [info 2021/06/23 15:58:46.485 GMT gemfire-cluster-server-1  
> tid=0x96] Reconnect completed.
> [info 2021/06/23 15:58:46.273 GMT gemfire-cluster-server-2  
> tid=0x97] Reconnect completed.
> [info 2021/06/23 15:58:46.902 GMT gemfire-cluster-server-3  
> tid=0x95] Reconnect completed.
> {noformat}
> In some rare occasions, though, one of the servers fails during the reconnect 
> phase with the following exception:
> {noformat}
> [error 2021/06/09 18:48:52.872 GMT gemfire-cluster-server-1  
> tid=0x91] Cache initialization for GemFireCache[id = 575310555; isClosing = 
> false; isShutDownAll = false; created = Wed Jun 09 18:46:49 GMT 2021; server 
> = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed 
> because:
> org.apache.geode.GemFireIOException: While starting cache server CacheServer 
> on port=40404 client subscription config policy=none client subscription 
> config capacity=1 client subscription config overflow directory=.
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.startCacheServers(CacheCreation.java:800)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:599)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:339)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4207)
>   at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1497)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1449)
>   at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:191)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2668)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2426)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1277)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1183)
>   at 
> org.apache.geode.distributed.interna

[jira] [Commented] (GEODE-9760) Partition Regions Gets Statistic Broken

2021-10-22 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432967#comment-17432967
 ] 

Juan Ramos commented on GEODE-9760:
---

Thanks [~jvarenina]!!

> Partition Regions Gets Statistic Broken
> ---
>
> Key: GEODE-9760
> URL: https://issues.apache.org/jira/browse/GEODE-9760
> Project: Geode
>  Issue Type: Bug
>  Components: statistics
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: needsTriage
> Attachments: workspace.zip
>
>
> The issue was introduced by 
> [GEODE-8876|https://issues.apache.org/jira/browse/GEODE-8876]. After commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165],
>  the {{CachePerfStats.gets}} statistic for {{PartitionRegion}} remains empty, 
> even though {{get}} operations are executed on the region.
> I'm attaching a small scenario that reproduces the problem every single time 
> when using any commit after the mentioned one.
> The {{reproduce.sh}} script:
> # Builds a small java client that creates 1 entries into the configured 
> region, and launches 5 threads to continuously execute the {{Region.get()}} 
> operation for 60 seconds.
> # Starts a cluster with 1 locator and 3 servers.
> # Creates a partition region.
> # Launches the client application and waits for it to complete.
> # Prints the amount of gets operations/sec for the relevant region (VSD can 
> be used here to inspect the values instead, the result is exactly the same).
> ---
> How to reproduce the problem:
> # Download the attached tar file.
> # Uncompress the file and update the {{GEMFIRE}} variable to use your local 
> build of the latest Geode from {{develop}}.
> # Execute the {{reproduce.sh}} script and wait for it to finish.
> Once the execution is done, the output will show the following:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-1.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=0 average=0 stddev=0 last=0
> {noformat}
> The above is clearly wrong, as the {{client}} application continuously 
> executed the {{Region.get()}} operation using 5 threads for 60 seconds.
> If you execute a reverse apply of commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165]
>  to undo the changes (use {{git show 5dde7d765c252ef20cfb16981b18e68903e32165 
> | git apply -R}}), re-build {{geode}} and execute the tests again the output, 
> instead, will show the correct values:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=75 min=0 max=22500.5 average=15252.61 
> stddev=7747.94 last=0
> server-1.gfs:
>   gets operations/sec: samples=75 min=0 max=22487.54 average=15241.54 
> stddev=7705.99 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=22194.61 average=15459.16 
> stddev=7751.5 last=0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9760) Partition Regions Gets Statistic Broken

2021-10-22 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432887#comment-17432887
 ] 

Juan Ramos commented on GEODE-9760:
---

[~jvarenina]: can you please revert the original commit and use another 
solution to prevent this issue from happening?.

> Partition Regions Gets Statistic Broken
> ---
>
> Key: GEODE-9760
> URL: https://issues.apache.org/jira/browse/GEODE-9760
> Project: Geode
>  Issue Type: Bug
>  Components: statistics
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: needsTriage
> Attachments: workspace.zip
>
>
> The issue was introduced by 
> [GEODE-8876|https://issues.apache.org/jira/browse/GEODE-8876]. After commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165],
>  the {{CachePerfStats.gets}} statistic for {{PartitionRegion}} remains empty, 
> even though {{get}} operations are executed on the region.
> I'm attaching a small scenario that reproduces the problem every single time 
> when using any commit after the mentioned one.
> The {{reproduce.sh}} script:
> # Builds a small java client that creates 1 entries into the configured 
> region, and launches 5 threads to continuously execute the {{Region.get()}} 
> operation for 60 seconds.
> # Starts a cluster with 1 locator and 3 servers.
> # Creates a partition region.
> # Launches the client application and waits for it to complete.
> # Prints the amount of gets operations/sec for the relevant region (VSD can 
> be used here to inspect the values instead, the result is exactly the same).
> ---
> How to reproduce the problem:
> # Download the attached tar file.
> # Uncompress the file and update the {{GEMFIRE}} variable to use your local 
> build of the latest Geode from {{develop}}.
> # Execute the {{reproduce.sh}} script and wait for it to finish.
> Once the execution is done, the output will show the following:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-1.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=0 average=0 stddev=0 last=0
> {noformat}
> The above is clearly wrong, as the {{client}} application continuously 
> executed the {{Region.get()}} operation using 5 threads for 60 seconds.
> If you execute a reverse apply of commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165]
>  to undo the changes (use {{git show 5dde7d765c252ef20cfb16981b18e68903e32165 
> | git apply -R}}), re-build {{geode}} and execute the tests again the output, 
> instead, will show the correct values:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=75 min=0 max=22500.5 average=15252.61 
> stddev=7747.94 last=0
> server-1.gfs:
>   gets operations/sec: samples=75 min=0 max=22487.54 average=15241.54 
> stddev=7705.99 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=22194.61 average=15459.16 
> stddev=7751.5 last=0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9760) Partition Regions Gets Statistic Broken

2021-10-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9760:
--
Attachment: workspace.zip

> Partition Regions Gets Statistic Broken
> ---
>
> Key: GEODE-9760
> URL: https://issues.apache.org/jira/browse/GEODE-9760
> Project: Geode
>  Issue Type: Bug
>  Components: statistics
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: needsTriage
> Attachments: workspace.zip
>
>
> The issue was introduced by 
> [GEODE-8876|https://issues.apache.org/jira/browse/GEODE-8876]. After commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165],
>  the {{CachePerfStats.gets}} statistic for {{PartitionRegion}} remains empty, 
> even though {{get}} operations are executed on the region.
> I'm attaching a small scenario that reproduces the problem every single time 
> when using any commit after the mentioned one.
> The {{reproduce.sh}} script:
> # Builds a small java client that creates 1 entries into the configured 
> region, and launches 5 threads to continuously execute the {{Region.get()}} 
> operation for 60 seconds.
> # Starts a cluster with 1 locator and 3 servers.
> # Creates a partition region.
> # Launches the client application and waits for it to complete.
> # Prints the amount of gets operations/sec for the relevant region (VSD can 
> be used here to inspect the values instead, the result is exactly the same).
> ---
> How to reproduce the problem:
> # Download the attached tar file.
> # Uncompress the file and update the {{GEMFIRE}} variable to use your local 
> build of the latest Geode from {{develop}}.
> # Execute the {{reproduce.sh}} script and wait for it to finish.
> Once the execution is done, the output will show the following:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-1.gfs:
>   gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=0 average=0 stddev=0 last=0
> {noformat}
> The above is clearly wrong, as the {{client}} application continuously 
> executed the {{Region.get()}} operation using 5 threads for 60 seconds.
> If you execute a reverse apply of commit 
> [5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165]
>  to undo the changes (use {{git show 5dde7d765c252ef20cfb16981b18e68903e32165 
> | git apply -R}}), re-build {{geode}} and execute the tests again the output, 
> instead, will show the correct values:
> {noformat}
> server-0.gfs:
>   gets operations/sec: samples=75 min=0 max=22500.5 average=15252.61 
> stddev=7747.94 last=0
> server-1.gfs:
>   gets operations/sec: samples=75 min=0 max=22487.54 average=15241.54 
> stddev=7705.99 last=0
> server-2.gfs:
>   gets operations/sec: samples=75 min=0 max=22194.61 average=15459.16 
> stddev=7751.5 last=0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9760) Partition Regions Gets Statistic Broken

2021-10-22 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9760:
-

 Summary: Partition Regions Gets Statistic Broken
 Key: GEODE-9760
 URL: https://issues.apache.org/jira/browse/GEODE-9760
 Project: Geode
  Issue Type: Bug
  Components: statistics
Affects Versions: 1.15.0
Reporter: Juan Ramos
 Attachments: workspace.zip


The issue was introduced by 
[GEODE-8876|https://issues.apache.org/jira/browse/GEODE-8876]. After commit 
[5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165],
 the {{CachePerfStats.gets}} statistic for {{PartitionRegion}} remains empty, 
even though {{get}} operations are executed on the region.
I'm attaching a small scenario that reproduces the problem every single time 
when using any commit after the mentioned one.
The {{reproduce.sh}} script:

# Builds a small java client that creates 1 entries into the configured 
region, and launches 5 threads to continuously execute the {{Region.get()}} 
operation for 60 seconds.
# Starts a cluster with 1 locator and 3 servers.
# Creates a partition region.
# Launches the client application and waits for it to complete.
# Prints the amount of gets operations/sec for the relevant region (VSD can be 
used here to inspect the values instead, the result is exactly the same).

---

How to reproduce the problem:
# Download the attached tar file.
# Uncompress the file and update the {{GEMFIRE}} variable to use your local 
build of the latest Geode from {{develop}}.
# Execute the {{reproduce.sh}} script and wait for it to finish.

Once the execution is done, the output will show the following:
{noformat}
server-0.gfs:
  gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
server-1.gfs:
  gets operations/sec: samples=74 min=0 max=0 average=0 stddev=0 last=0
server-2.gfs:
  gets operations/sec: samples=75 min=0 max=0 average=0 stddev=0 last=0
{noformat}

The above is clearly wrong, as the {{client}} application continuously executed 
the {{Region.get()}} operation using 5 threads for 60 seconds.

If you execute a reverse apply of commit 
[5dde7d765c252ef20cfb16981b18e68903e32165|https://github.com/apache/geode/commit/5dde7d765c252ef20cfb16981b18e68903e32165]
 to undo the changes (use {{git show 5dde7d765c252ef20cfb16981b18e68903e32165 | 
git apply -R}}), re-build {{geode}} and execute the tests again the output, 
instead, will show the correct values:
{noformat}
server-0.gfs:
  gets operations/sec: samples=75 min=0 max=22500.5 average=15252.61 
stddev=7747.94 last=0
server-1.gfs:
  gets operations/sec: samples=75 min=0 max=22487.54 average=15241.54 
stddev=7705.99 last=0
server-2.gfs:
  gets operations/sec: samples=75 min=0 max=22194.61 average=15459.16 
stddev=7751.5 last=0
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9538) NullPointerException in ServerConnection.doNormalMessage()

2021-08-24 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9538:
--
Description: 
I've hit this issue while executing some chaos testing over a GemFire cluster 
using 2 locators and 3 servers; {{SSL}} is enabled and a dummy 
{{SecurityManager}} is configured to authenticate and authorize a 
pre-configured set of well known users.
 There are 3 {{PARTITION_REDUNDANT}} regions configured, one per client, each 
with 1 redundant copy. Once the cluster is up and running, 3 clients 
continuously execute {{Region.get}} and {{Region.put}} operations on a known 
set of keys for its own {{Region}} (created with {{PROXY}} type), and another 
process executes the following logic in parallel (pseudocode):
{noformat}
for server in ${servers}
do
# Pause the JVM for 30 seconds to simulate a stop the world GC
kill -STOP server 
sleep 30

# Unpause the JVM, wait for member to reconnect and regions to recover 
redundancy configured
kill -CONT "${SERVER_PID}"
waitForReconnectcompletedInServerLog
waitForNumBucketsWithoutRedundancyToBeZeroInGfshShowRegionMetrics
done
{noformat}
The test works fine most of the time, but randomly fails due to an unexpected 
exception logged within the logs of at least one server. The exception is 
always reported from a {{ServerConnection}} thread on the server member that 
has just returned to life, as an example:
{noformat}
[info 2021/08/09 11:01:07.430 GMT system-test-gemfire-server-2  tid=0x8d] Configured redundancy of 2 copies has been 
restored to /system-test-client-7f6795dfb8-v7hh8-region

[warn 2021/08/09 11:02:19.742 GMT system-test-gemfire-server-2 
 tid=0x4d] Server connection from 
[identity(system-test-client-7f6795dfb8-pc8mv(SpringBasedClientCacheApplication:1:loner):34788:814b8d2a:SpringBasedClientCacheApplication,connection=1;
 port=50264] is being terminated because its client timeout of 1 has 
expired.

[warn 2021/08/09 11:02:19.744 GMT system-test-gemfire-server-2 
 tid=0x4d] ClientHealthMonitor: Unregistering 
client with member id 
identity(system-test-client-7f6795dfb8-pc8mv(SpringBasedClientCacheApplication:1:loner):34788:814b8d2a:SpringBasedClientCacheApplication,connection=1
 due to: Unknown reason

[info 2021/08/09 11:02:19.745 GMT system-test-gemfire-server-2  tid=0x1e] received suspect message 
from 
system-test-gemfire-locator-0(system-test-gemfire-locator-0:1:locator):41000
 for system-test-gemfire-server-2(system-test-gemfire-server-2:1):41000: 
Member isn't responding to heartbeat requests

[info 2021/08/09 11:02:19.747 GMT system-test-gemfire-server-2  tid=0x1e] Membership received a 
request to remove 
system-test-gemfire-server-2(system-test-gemfire-server-2:1):41000 from 
system-test-gemfire-locator-1(system-test-gemfire-locator-1:1:locator):41000
 reason=Member isn't responding to heartbeat requests

[warn 2021/08/09 11:02:19.748 GMT system-test-gemfire-server-2  
tid=0x38] Statistics sampling thread detected a wakeup delay of 29965 ms, 
indicating a possible resource issue. Check the GC, memory, and CPU statistics.

...

[warn 2021/08/09 11:02:19.854 GMT system-test-gemfire-server-2 
 tid=0x91] ClientHealthMonitor: 
Unregistering client with member id 
identity(system-test-client-7f6795dfb8-v7hh8(SpringBasedClientCacheApplication:1:loner):44012:ec3c8d2a:SpringBasedClientCacheApplication,connection=1
 due to: The connection has been reset while reading the header

[info 2021/08/09 11:02:19.867 GMT system-test-gemfire-server-2  tid=0x1e] saving cache server 
configuration for use with the cluster-configuration service on reconnect

[info 2021/08/09 11:02:19.867 GMT system-test-gemfire-server-2  tid=0x1e] cache server 
configuration saved

[fatal 2021/08/09 11:02:19.876 GMT system-test-gemfire-server-2 
 tid=0xa9] Uncaught exception in 
thread Thread[ServerConnection on port 40404 Thread 14,5,main]
java.lang.NullPointerException
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:865)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:1022)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1275)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:690)
at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120)
at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}
The problem itself is really hard to reproduce, we only hit it twice in around 
200 runs. 

[jira] [Created] (GEODE-9538) NullPointerException in ServerConnection.doNormalMessage()

2021-08-24 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9538:
-

 Summary: NullPointerException in ServerConnection.doNormalMessage()
 Key: GEODE-9538
 URL: https://issues.apache.org/jira/browse/GEODE-9538
 Project: Geode
  Issue Type: Bug
  Components: membership
Reporter: Juan Ramos


I've hit this issue while executing some chaos testing over a GemFire cluster 
using 2 locators and 3 servers; {{SSL}} is enabled and a dummy 
{{SecurityManager}} is configured to authenticate and authorize a 
pre-configured set of well known users.
 There are 3 {{PARTITION_REDUNDANT}} regions configured, one per client, each 
with 1 redundant copy. Once the cluster is up and running, 3 clients 
continuously execute {{Region.get}} and {{Region.put}} operations on a known 
set of keys for its own {{Region}} (created with {{PROXY}} type), and another 
process executes the following logic in parallel (pseudocode):
{noformat}
for server in ${servers}
do
# Pause the JVM for 30 seconds to simulate a stop the world GC
kill -STOP server 
sleep 30

# Unpause the JVM, wait for member to reconnect and regions to recover 
redundancy configured
kill -CONT "${SERVER_PID}"
waitForReconnectcompletedInServerLog
waitForNumBucketsWithoutRedundancyToBeZeroInGfshShowRegionMetrics
done
{noformat}
The test works fine most of the time, but randomly fails due to an unexpected 
exception logged within the logs of at least one server. The exception is 
always reported from a {{ServerConnection}} thread on the server member that 
has just returned to life, as an example:
{noformat}
[info 2021/08/09 11:01:07.430 GMT system-test-gemfire-server-2  tid=0x8d] Configured redundancy of 2 copies has been 
restored to /system-test-client-7f6795dfb8-v7hh8-region

[warn 2021/08/09 11:02:19.742 GMT system-test-gemfire-server-2 
 tid=0x4d] Server connection from 
[identity(system-test-client-7f6795dfb8-pc8mv(SpringBasedClientCacheApplication:1:loner):34788:814b8d2a:SpringBasedClientCacheApplication,connection=1;
 port=50264] is being terminated because its client timeout of 1 has 
expired.

[warn 2021/08/09 11:02:19.744 GMT system-test-gemfire-server-2 
 tid=0x4d] ClientHealthMonitor: Unregistering 
client with member id 
identity(system-test-client-7f6795dfb8-pc8mv(SpringBasedClientCacheApplication:1:loner):34788:814b8d2a:SpringBasedClientCacheApplication,connection=1
 due to: Unknown reason

[info 2021/08/09 11:02:19.745 GMT system-test-gemfire-server-2  tid=0x1e] received suspect message 
from 
system-test-gemfire-locator-0(system-test-gemfire-locator-0:1:locator):41000
 for system-test-gemfire-server-2(system-test-gemfire-server-2:1):41000: 
Member isn't responding to heartbeat requests

[info 2021/08/09 11:02:19.747 GMT system-test-gemfire-server-2  tid=0x1e] Membership received a 
request to remove 
system-test-gemfire-server-2(system-test-gemfire-server-2:1):41000 from 
system-test-gemfire-locator-1(system-test-gemfire-locator-1:1:locator):41000
 reason=Member isn't responding to heartbeat requests

[warn 2021/08/09 11:02:19.748 GMT system-test-gemfire-server-2  
tid=0x38] Statistics sampling thread detected a wakeup delay of 29965 ms, 
indicating a possible resource issue. Check the GC, memory, and CPU statistics.

...

[warn 2021/08/09 11:02:19.854 GMT system-test-gemfire-server-2 
 tid=0x91] ClientHealthMonitor: 
Unregistering client with member id 
identity(system-test-client-7f6795dfb8-v7hh8(SpringBasedClientCacheApplication:1:loner):44012:ec3c8d2a:SpringBasedClientCacheApplication,connection=1
 due to: The connection has been reset while reading the header

[info 2021/08/09 11:02:19.867 GMT system-test-gemfire-server-2  tid=0x1e] saving cache server 
configuration for use with the cluster-configuration service on reconnect

[info 2021/08/09 11:02:19.867 GMT system-test-gemfire-server-2  tid=0x1e] cache server 
configuration saved

[fatal 2021/08/09 11:02:19.876 GMT system-test-gemfire-server-2 
 tid=0xa9] Uncaught exception in 
thread Thread[ServerConnection on port 40404 Thread 14,5,main]
java.lang.NullPointerException
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:865)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:1022)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1275)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:690)
at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:1

[jira] [Updated] (GEODE-9512) java.lang.IllegalStateException: Detected old version (pre 5.0.1) of GemFire or non-GemFire during handshake due to initial byte being 1

2021-08-17 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9512:
--
Description: 
I've hit this issue while executing some chaos testing over a GemFire cluster 
using 2 locators and 3 servers; {{SSL}} is enabled, and a dummy 
{{SecurityManager}} is configured which authenticates any user and always 
returns {{true}} within the {{authorize}} method. There are 3 
{{PARTITION_REDUNDANT}} regions configured, one per client, each with 1 
redundant copy. Once the cluster is up and running, 3 clients continuously 
execute {{Region.put}} operations on a well known set of keys for its own 
{{Region}} (created with {{PROXY}} type), and another process executes the 
following logic in parallel (pseudocode):
{noformat}
for server in ${servers}
do
# Pause the JVM for 30 seconds to simulate a stop the world GC
kill -STOP server 
sleep 30

# Unpause the JVM, wait for member to reconnect and regions to recover 
redundancy configured
kill -CONT "${SERVER_PID}"
waitForReconnectcompletedInServerLog
waitForNumBucketsWithoutRedundancyToBeZeroInGfshShowRegionMetrics
done
{noformat}
The test works fine most of the time, but randomly fails due to an unexpected 
exception logged within the logs for at least one locator. The exception is 
always reported from a {{P2P message reader}} thread on the locator for a 
server member that has just returned to life, as an example:
{noformat}
 LOCATOR-0
[warn 2021/08/17 05:20:45.166 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=61 dom #1 local port=48141 remote port=46174> tid=0x6f] P2P 
message reader@354fac47 timed out during a membership check.

[fatal 2021/08/17 05:20:45.166 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=61 dom #1 local port=48141 remote port=46174> tid=0x6f] 
Error deserializing P2P handshake message
java.lang.IllegalStateException: Detected old version (pre 5.0.1) of GemFire or 
non-GemFire during handshake due to initial byte being 1
at 
org.apache.geode.internal.tcp.Connection.readHandshakeForReceiver(Connection.java:2875)
at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2825)
at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1649)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1489)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)


 SERVER-2
[warn 2021/08/17 05:20:44.012 GMT system-test-gemfire-server-2  
tid=0x35] Statistics sampling thread detected a wakeup delay of 29070 ms, 
indicating a possible resource issue. Check the GC, memory, and CPU statistics.

[warn 2021/08/17 05:20:44.015 GMT system-test-gemfire-server-2  tid=0x23] Failure detection heartbeat-generation thread overslept by 
more than a full period. Asleep time: 31,175,291,931 nanoseconds. Period: 
2,500,000,000 nanoseconds.

[info 2021/08/17 05:20:44.143 GMT system-test-gemfire-server-2  tid=0x1e] saving cache server 
configuration for use with the cluster-configuration service on reconnect

[info 2021/08/17 05:20:44.143 GMT system-test-gemfire-server-2  tid=0x1e] cache server 
configuration saved

[info 2021/08/17 05:20:44.233 GMT system-test-gemfire-server-2 
 tid=0xe5] Stopping membership services

[info 2021/08/17 05:20:44.455 GMT system-test-gemfire-server-2 
 tid=0xe5] Disconnecting old DistributedSystem to prepare for 
a reconnect attempt

[info 2021/08/17 05:20:44.463 GMT system-test-gemfire-server-2 
 tid=0xe5] GemFireCache[id = 252990056; isClosing = true; 
isShutDownAll = false; created = Tue Aug 17 05:11:50 GMT 2021; server = true; 
copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.

[info 2021/08/17 05:20:44.544 GMT system-test-gemfire-server-2 
 tid=0xe5] Cache server on port 40404 is shutting down.

[info 2021/08/17 05:20:44.565 GMT system-test-gemfire-server-2  tid=0x5e] The QueueRemovalThread is done.
{noformat}
 

The full set of logs and statistics from all member of the cluster can be found 
[here|https://drive.google.com/file/d/1jU_LIut9DVlZNniAdb53IglM3iWbX49C/view].

Contrary to what the exception message states, it's worth noticing that *_all 
members within the cluster (including clients) are using the same Geode 
version._*

—

Below are some extra logs from when I was able to reproduce the issue with 
{{log-level=fine}} configured on all members:
{noformat}
 LOCATOR-0
[debug 2021/08/16 20:40:22.858 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=47 dom #1 local port=58373 remote port=57818> tid=0x87] P2P 
handshake remoteAddr is 
system-test-gemfire-server-1(system-test-gemfire-server

[jira] [Created] (GEODE-9512) java.lang.IllegalStateException: Detected old version (pre 5.0.1) of GemFire or non-GemFire during handshake due to initial byte being 1

2021-08-17 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9512:
-

 Summary: java.lang.IllegalStateException: Detected old version 
(pre 5.0.1) of GemFire or non-GemFire during handshake due to initial byte 
being 1
 Key: GEODE-9512
 URL: https://issues.apache.org/jira/browse/GEODE-9512
 Project: Geode
  Issue Type: Bug
  Components: membership
Affects Versions: 1.15.0
Reporter: Juan Ramos


I've hit this issue while executing some chaos testing over a GemFire cluster 
using 2 locators and 3 servers; {{SSL}} is enabled, and a dummy 
{{SecurityManager}} is configured which authenticates any user and always 
returns {{true}} within the {{authorize}} method. There are 3 
{{PARTITION_REDUNDANT}} regions configured, one per client, each with 1 
redundant copy. Once the cluster is up and running, 3 clients continuously 
execute {{Region.put}} operations on a well known set of keys for its own 
{{Region}} (created with {{PROXY}} type), and another process executes the 
following logic in parallel (pseudocode):
{noformat}
for server in ${servers}
do
# Pause the JVM for 30 seconds to simulate a stop the world GC
kill -STOP server 
sleep 30

# Unpause the JVM, wait for member to reconnect and regions to recover 
redundancy configured
kill -CONT "${SERVER_PID}"
waitForReconnectcompletedInServerLog
waitForNumBucketsWithoutRedundancyToBeZeroInGfshShowRegionMetrics
done
{noformat}
The test works fine most of the time, but randomly fails due to an unexpected 
exception logged within the logs for at least one locator. The exception is 
always reported from a {{P2P message reader}} thread on the locator for a 
server member that has just returned to life, as an example:
{noformat}
 LOCATOR-0
[warn 2021/08/17 05:20:45.166 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=61 dom #1 local port=48141 remote port=46174> tid=0x6f] P2P 
message reader@354fac47 timed out during a membership check.

[fatal 2021/08/17 05:20:45.166 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=61 dom #1 local port=48141 remote port=46174> tid=0x6f] 
Error deserializing P2P handshake message
java.lang.IllegalStateException: Detected old version (pre 5.0.1) of GemFire or 
non-GemFire during handshake due to initial byte being 1
at 
org.apache.geode.internal.tcp.Connection.readHandshakeForReceiver(Connection.java:2875)
at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2825)
at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1649)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1489)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)


 SERVER-2
[warn 2021/08/17 05:20:44.012 GMT system-test-gemfire-server-2  
tid=0x35] Statistics sampling thread detected a wakeup delay of 29070 ms, 
indicating a possible resource issue. Check the GC, memory, and CPU statistics.

[warn 2021/08/17 05:20:44.015 GMT system-test-gemfire-server-2  tid=0x23] Failure detection heartbeat-generation thread overslept by 
more than a full period. Asleep time: 31,175,291,931 nanoseconds. Period: 
2,500,000,000 nanoseconds.

[info 2021/08/17 05:20:44.143 GMT system-test-gemfire-server-2  tid=0x1e] saving cache server 
configuration for use with the cluster-configuration service on reconnect

[info 2021/08/17 05:20:44.143 GMT system-test-gemfire-server-2  tid=0x1e] cache server 
configuration saved

[info 2021/08/17 05:20:44.233 GMT system-test-gemfire-server-2 
 tid=0xe5] Stopping membership services

[info 2021/08/17 05:20:44.455 GMT system-test-gemfire-server-2 
 tid=0xe5] Disconnecting old DistributedSystem to prepare for 
a reconnect attempt

[info 2021/08/17 05:20:44.463 GMT system-test-gemfire-server-2 
 tid=0xe5] GemFireCache[id = 252990056; isClosing = true; 
isShutDownAll = false; created = Tue Aug 17 05:11:50 GMT 2021; server = true; 
copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.

[info 2021/08/17 05:20:44.544 GMT system-test-gemfire-server-2 
 tid=0xe5] Cache server on port 40404 is shutting down.

[info 2021/08/17 05:20:44.565 GMT system-test-gemfire-server-2  tid=0x5e] The QueueRemovalThread is done.
{noformat}
Contrary to what the exception message states, it's worth noticing that *_all 
members within the cluster (including clients) are using the same Geode 
version._*

—

Below are some extra logs from when I was able to reproduce the issue with 
{{log-level=fine}} configured on all members:
{noformat}
 LOCATOR-0
[debug 2021/08/16 20:40:22.858 GMT system-test-gemfire-locator-0 :41000 unshared 
ordered sender uid=47 dom #1 local port=58373 remot

[jira] [Resolved] (GEODE-9494) HTTP Session State Module - Security Properties

2021-08-11 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-9494.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> HTTP Session State Module - Security Properties
> ---
>
> Key: GEODE-9494
> URL: https://issues.apache.org/jira/browse/GEODE-9494
> Project: Geode
>  Issue Type: Bug
>  Components: http session
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In order to configure authentication and authorization, the geode cache must 
> be configured with either the {{security-client-auth-init}} or 
> {{security-peer-auth-init}} properties.
>  The implementation of the {{AuthInitialize}} interface is supposed to obtain 
> credentials for a client or peer and, in practice, it should be able to 
> connect to an external data source or use some extra configuration as to know 
> where to retrieve the actual credentials from. The 
> {{AuthInitialize.getCredentials()}} method receives all gemfire properties 
> configured with the prefix {{security-}} and its expected to use them in 
> order to configure itself.
>  The {{AbstractCache}} class, however, prevents the user from configuring any 
> property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
> method, and this does not include those properties starting with 
> {{security-}}:
> {noformat}
>   public void setProperty(String name, String value) {
> // TODO Look at fake attributes
> if (name.equals("className")) {
>   return;
> }
> // Determine the validity of the input property
> boolean validProperty = false;
> // TODO: AbstractDistributionConfig is internal and _getAttNames is 
> designed for testing.
> for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
>   if (name.equals(gemfireProperty)) {
> validProperty = true;
> break;
>   }
> }
> ...
> }
> {noformat}
> The above, in turn, makes almost impossible for users to correctly implement 
> {{AuthInitialize}} without leveraging system properties or hardcoded paths 
> for external configuration.
> —
> As a side note, {{security-username}} and {{security-password}} are not 
> "formal" distributed system properties, so they also can't be used when 
> configuring the HTTP session state module:
> {noformat}
>className="org.apache.geode.modules.session.catalina.ClientServerCacheLifecycleListener"
> security-username="myUser"
> security-password="myPassword"/>
> {noformat}
> {noformat}
> 10-Aug-2021 12:15:57.118 WARNING [main] 
> org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The 
> input property named security-username is not a valid GemFire property. It is 
> being ignored.
> 10-Aug-2021 12:15:57.123 WARNING [main] 
> org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The 
> input property named security-password is not a valid GemFire property. It is 
> being ignored.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9494) HTTP Session State Module - Security Properties

2021-08-10 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9494:
--
Description: 
In order to configure authentication and authorization, the geode cache must be 
configured with either the {{security-client-auth-init}} or 
{{security-peer-auth-init}} properties.
 The implementation of the {{AuthInitialize}} interface is supposed to obtain 
credentials for a client or peer and, in practice, it should be able to connect 
to an external data source or use some extra configuration as to know where to 
retrieve the actual credentials from. The {{AuthInitialize.getCredentials()}} 
method receives all gemfire properties configured with the prefix {{security-}} 
and its expected to use them in order to configure itself.
 The {{AbstractCache}} class, however, prevents the user from configuring any 
property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
method, and this does not include those properties starting with {{security-}}:
{noformat}
  public void setProperty(String name, String value) {
// TODO Look at fake attributes
if (name.equals("className")) {
  return;
}

// Determine the validity of the input property
boolean validProperty = false;
// TODO: AbstractDistributionConfig is internal and _getAttNames is 
designed for testing.
for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
  if (name.equals(gemfireProperty)) {
validProperty = true;
break;
  }
}
...
}
{noformat}
The above, in turn, makes almost impossible for users to correctly implement 
{{AuthInitialize}} without leveraging system properties or hardcoded paths for 
external configuration.

—

As a side note, {{security-username}} and {{security-password}} are not 
"formal" distributed system properties, so they also can't be used when 
configuring the HTTP session state module:
{noformat}
  
{noformat}
{noformat}
10-Aug-2021 12:15:57.118 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-username is not a valid GemFire property. It is being 
ignored.
10-Aug-2021 12:15:57.123 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-password is not a valid GemFire property. It is being 
ignored.
{noformat}

  was:
In order to configure authentication and authorization, the geode cache must be 
configured with either the {{security-client-auth-init}} or 
{{security-peer-auth-init}} properties.
The implementation of the {{AuthInitialize}} interface is supposed to obtain 
credentials for a client or peer and, in practice, it should be able to connect 
to an external data source or use some extra configuration as to know where to 
retrieve the actual credentials from. The {{AuthInitialize.getCredentials()}} 
method receives all gemfire properties configured with the prefix {{security-}} 
and its expected to use them in order to configure itself.
The {{AbstractCache}} class, however, prevents the user from configuring any 
property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
method, and this does not include those properties starting with {{security-}}:
{noformat}

  public void setProperty(String name, String value) {
// TODO Look at fake attributes
if (name.equals("className")) {
  return;
}

// Determine the validity of the input property
boolean validProperty = false;
// TODO: AbstractDistributionConfig is internal and _getAttNames is 
designed for testing.
for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
  if (name.equals(gemfireProperty)) {
validProperty = true;
break;
  }
}
...
}
{noformat}

The above, in turn, makes almost impossible for users  to correctly implement 
{{AuthInitialize}} without leveraging system properties or hardcoded paths for 
external configuration.

---

As a side note, {{security-username}} and {{security-password}} are not 
"formal" distributed system properties, so they also can't be used when 
configuring the Tomcat session state module:
{noformat}
  
{noformat}

{noformat}
10-Aug-2021 12:15:57.118 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-username is not a valid GemFire property. It is being 
ignored.
10-Aug-2021 12:15:57.123 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-password is not a valid GemFire property. It is being 
ignored.
{noformat}


> HTTP Session State Module - Security Properties
> ---
>
> Key: GEODE-9494
> URL: https://issues.apache.org/jira/browse/GEODE-9494
> Project: Geode
>  Issue Type: Bug
> 

[jira] [Updated] (GEODE-9494) HTTP Session State Module - Security Properties

2021-08-10 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9494:
--
Summary: HTTP Session State Module - Security Properties  (was: Tomcat 
Session State Module - Security Properties)

> HTTP Session State Module - Security Properties
> ---
>
> Key: GEODE-9494
> URL: https://issues.apache.org/jira/browse/GEODE-9494
> Project: Geode
>  Issue Type: Bug
>  Components: http session
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>
> In order to configure authentication and authorization, the geode cache must 
> be configured with either the {{security-client-auth-init}} or 
> {{security-peer-auth-init}} properties.
> The implementation of the {{AuthInitialize}} interface is supposed to obtain 
> credentials for a client or peer and, in practice, it should be able to 
> connect to an external data source or use some extra configuration as to know 
> where to retrieve the actual credentials from. The 
> {{AuthInitialize.getCredentials()}} method receives all gemfire properties 
> configured with the prefix {{security-}} and its expected to use them in 
> order to configure itself.
> The {{AbstractCache}} class, however, prevents the user from configuring any 
> property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
> method, and this does not include those properties starting with 
> {{security-}}:
> {noformat}
>   public void setProperty(String name, String value) {
> // TODO Look at fake attributes
> if (name.equals("className")) {
>   return;
> }
> // Determine the validity of the input property
> boolean validProperty = false;
> // TODO: AbstractDistributionConfig is internal and _getAttNames is 
> designed for testing.
> for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
>   if (name.equals(gemfireProperty)) {
> validProperty = true;
> break;
>   }
> }
> ...
> }
> {noformat}
> The above, in turn, makes almost impossible for users  to correctly implement 
> {{AuthInitialize}} without leveraging system properties or hardcoded paths 
> for external configuration.
> ---
> As a side note, {{security-username}} and {{security-password}} are not 
> "formal" distributed system properties, so they also can't be used when 
> configuring the Tomcat session state module:
> {noformat}
>className="org.apache.geode.modules.session.catalina.ClientServerCacheLifecycleListener"
> security-username="myUser"
> security-password="myPassword"/>
> {noformat}
> {noformat}
> 10-Aug-2021 12:15:57.118 WARNING [main] 
> org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The 
> input property named security-username is not a valid GemFire property. It is 
> being ignored.
> 10-Aug-2021 12:15:57.123 WARNING [main] 
> org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The 
> input property named security-password is not a valid GemFire property. It is 
> being ignored.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9494) Tomcat Session State Module - Security Properties

2021-08-10 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9494:
--
Description: 
In order to configure authentication and authorization, the geode cache must be 
configured with either the {{security-client-auth-init}} or 
{{security-peer-auth-init}} properties.
The implementation of the {{AuthInitialize}} interface is supposed to obtain 
credentials for a client or peer and, in practice, it should be able to connect 
to an external data source or use some extra configuration as to know where to 
retrieve the actual credentials from. The {{AuthInitialize.getCredentials()}} 
method receives all gemfire properties configured with the prefix {{security-}} 
and its expected to use them in order to configure itself.
The {{AbstractCache}} class, however, prevents the user from configuring any 
property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
method, and this does not include those properties starting with {{security-}}:
{noformat}

  public void setProperty(String name, String value) {
// TODO Look at fake attributes
if (name.equals("className")) {
  return;
}

// Determine the validity of the input property
boolean validProperty = false;
// TODO: AbstractDistributionConfig is internal and _getAttNames is 
designed for testing.
for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
  if (name.equals(gemfireProperty)) {
validProperty = true;
break;
  }
}
...
}
{noformat}

The above, in turn, makes almost impossible for users  to correctly implement 
{{AuthInitialize}} without leveraging system properties or hardcoded paths for 
external configuration.

---

As a side note, {{security-username}} and {{security-password}} are not 
"formal" distributed system properties, so they also can't be used when 
configuring the Tomcat session state module:
{noformat}
  
{noformat}

{noformat}
10-Aug-2021 12:15:57.118 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-username is not a valid GemFire property. It is being 
ignored.
10-Aug-2021 12:15:57.123 WARNING [main] 
org.apache.geode.modules.session.bootstrap.AbstractCache.setProperty The input 
property named security-password is not a valid GemFire property. It is being 
ignored.
{noformat}

  was:
In order to configure authentication and authorization, the geode cache must be 
configured with either the {{security-client-auth-init}} or 
{{security-peer-auth-init}} properties.
The implementation of the {{AuthInitialize}} interface is supposed to obtain 
credentials for a client or peer and, in practice, it should be able to connect 
to an external data source or use some extra configuration as to know where to 
retrieve the actual credentials from. The {{AuthInitialize.getCredentials()}} 
method receives all gemfire properties configured with the prefix {{security-}} 
and its expected to use them in order to configure itself.
The {{AbstractCache}} class, however, prevents the user from configuring any 
property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
method, and this does not include those properties starting with {{security-}}:
{noformat}

  public void setProperty(String name, String value) {
// TODO Look at fake attributes
if (name.equals("className")) {
  return;
}

// Determine the validity of the input property
boolean validProperty = false;
// TODO: AbstractDistributionConfig is internal and _getAttNames is 
designed for testing.
for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
  if (name.equals(gemfireProperty)) {
validProperty = true;
break;
  }
}
...
}
{noformat}

The above, in turn, makes almost impossible for users  to correctly implement 
{{AuthInitialize}} without leveraging system properties or hardcoded paths for 
external configuration.


> Tomcat Session State Module - Security Properties
> -
>
> Key: GEODE-9494
> URL: https://issues.apache.org/jira/browse/GEODE-9494
> Project: Geode
>  Issue Type: Bug
>  Components: http session
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>
> In order to configure authentication and authorization, the geode cache must 
> be configured with either the {{security-client-auth-init}} or 
> {{security-peer-auth-init}} properties.
> The implementation of the {{AuthInitialize}} interface is supposed to obtain 
> credentials for a client or peer and, in practice, it should be able to 
> connect to an external data source or use some extra configuration as to know 
> where to retrieve the actual credentials from. The 
> {{AuthInitialize.getCredentials()}} meth

[jira] [Created] (GEODE-9494) Tomcat Session State Module - Security Properties

2021-08-10 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9494:
-

 Summary: Tomcat Session State Module - Security Properties
 Key: GEODE-9494
 URL: https://issues.apache.org/jira/browse/GEODE-9494
 Project: Geode
  Issue Type: Bug
  Components: http session
Reporter: Juan Ramos


In order to configure authentication and authorization, the geode cache must be 
configured with either the {{security-client-auth-init}} or 
{{security-peer-auth-init}} properties.
The implementation of the {{AuthInitialize}} interface is supposed to obtain 
credentials for a client or peer and, in practice, it should be able to connect 
to an external data source or use some extra configuration as to know where to 
retrieve the actual credentials from. The {{AuthInitialize.getCredentials()}} 
method receives all gemfire properties configured with the prefix {{security-}} 
and its expected to use them in order to configure itself.
The {{AbstractCache}} class, however, prevents the user from configuring any 
property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
method, and this does not include those properties starting with {{security-}}:
{noformat}

  public void setProperty(String name, String value) {
// TODO Look at fake attributes
if (name.equals("className")) {
  return;
}

// Determine the validity of the input property
boolean validProperty = false;
// TODO: AbstractDistributionConfig is internal and _getAttNames is 
designed for testing.
for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
  if (name.equals(gemfireProperty)) {
validProperty = true;
break;
  }
}
...
}
{noformat}

The above, in turn, makes almost impossible for users  to correctly implement 
{{AuthInitialize}} without leveraging system properties or hardcoded paths for 
external configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9494) Tomcat Session State Module - Security Properties

2021-08-10 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-9494:
-

Assignee: Juan Ramos

> Tomcat Session State Module - Security Properties
> -
>
> Key: GEODE-9494
> URL: https://issues.apache.org/jira/browse/GEODE-9494
> Project: Geode
>  Issue Type: Bug
>  Components: http session
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>
> In order to configure authentication and authorization, the geode cache must 
> be configured with either the {{security-client-auth-init}} or 
> {{security-peer-auth-init}} properties.
> The implementation of the {{AuthInitialize}} interface is supposed to obtain 
> credentials for a client or peer and, in practice, it should be able to 
> connect to an external data source or use some extra configuration as to know 
> where to retrieve the actual credentials from. The 
> {{AuthInitialize.getCredentials()}} method receives all gemfire properties 
> configured with the prefix {{security-}} and its expected to use them in 
> order to configure itself.
> The {{AbstractCache}} class, however, prevents the user from configuring any 
> property not returned by the {{AbstractDistributionConfig._getAttNames()}} 
> method, and this does not include those properties starting with 
> {{security-}}:
> {noformat}
>   public void setProperty(String name, String value) {
> // TODO Look at fake attributes
> if (name.equals("className")) {
>   return;
> }
> // Determine the validity of the input property
> boolean validProperty = false;
> // TODO: AbstractDistributionConfig is internal and _getAttNames is 
> designed for testing.
> for (String gemfireProperty : AbstractDistributionConfig._getAttNames()) {
>   if (name.equals(gemfireProperty)) {
> validProperty = true;
> break;
>   }
> }
> ...
> }
> {noformat}
> The above, in turn, makes almost impossible for users  to correctly implement 
> {{AuthInitialize}} without leveraging system properties or hardcoded paths 
> for external configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9402) Automatic Reconnect Failure: Address already in use

2021-06-25 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9402:
--
Attachment: cluster_logs_pks_121.zip
cluster_logs_gke_latest_54.zip

> Automatic Reconnect Failure: Address already in use
> ---
>
> Key: GEODE-9402
> URL: https://issues.apache.org/jira/browse/GEODE-9402
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Juan Ramos
>Priority: Major
> Attachments: cluster_logs_gke_latest_54.zip, cluster_logs_pks_121.zip
>
>
> There are 2 locators and 4 servers during the test, once they're all up and 
> running the test drops the network connectivity between all members to 
> generate a full network partition and cause all members to shutdown and go 
> into reconnect mode. Upon reaching the mentioned state, the test 
> automatically restores the network connectivity and expects all members to 
> automatically go up again and re-form the distributed system.
>  This works fine most of the time, and we see every member successfully 
> reconnecting to the distributed system:
> {noformat}
> [info 2021/06/23 15:58:12.981 GMT gemfire-cluster-locator-0  
> tid=0x87] Reconnect completed.
> [info 2021/06/23 15:58:14.726 GMT gemfire-cluster-locator-1  
> tid=0x86] Reconnect completed.
> [info 2021/06/23 15:58:46.702 GMT gemfire-cluster-server-0  
> tid=0x94] Reconnect completed.
> [info 2021/06/23 15:58:46.485 GMT gemfire-cluster-server-1  
> tid=0x96] Reconnect completed.
> [info 2021/06/23 15:58:46.273 GMT gemfire-cluster-server-2  
> tid=0x97] Reconnect completed.
> [info 2021/06/23 15:58:46.902 GMT gemfire-cluster-server-3  
> tid=0x95] Reconnect completed.
> {noformat}
> In some rare occasions, though, one of the servers fails during the reconnect 
> phase with the following exception:
> {noformat}
> [error 2021/06/09 18:48:52.872 GMT gemfire-cluster-server-1  
> tid=0x91] Cache initialization for GemFireCache[id = 575310555; isClosing = 
> false; isShutDownAll = false; created = Wed Jun 09 18:46:49 GMT 2021; server 
> = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed 
> because:
> org.apache.geode.GemFireIOException: While starting cache server CacheServer 
> on port=40404 client subscription config policy=none client subscription 
> config capacity=1 client subscription config overflow directory=.
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.startCacheServers(CacheCreation.java:800)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:599)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:339)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4207)
>   at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1497)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1449)
>   at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:191)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2668)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2426)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1277)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1183)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1807)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.net.BindException: Address already in use (Bind failed)
>   at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.base/java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:436)
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:395)
>   at 
> org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:70)
>   at 
> org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:529)
>   at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.(AcceptorImpl.java:573)
>   at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorBuilder.cr

[jira] [Created] (GEODE-9402) Automatic Reconnect Failure: Address already in use

2021-06-25 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9402:
-

 Summary: Automatic Reconnect Failure: Address already in use
 Key: GEODE-9402
 URL: https://issues.apache.org/jira/browse/GEODE-9402
 Project: Geode
  Issue Type: Bug
  Components: membership
Reporter: Juan Ramos


There are 2 locators and 4 servers during the test, once they're all up and 
running the test drops the network connectivity between all members to generate 
a full network partition and cause all members to shutdown and go into 
reconnect mode. Upon reaching the mentioned state, the test automatically 
restores the network connectivity and expects all members to automatically go 
up again and re-form the distributed system.
 This works fine most of the time, and we see every member successfully 
reconnecting to the distributed system:
{noformat}
[info 2021/06/23 15:58:12.981 GMT gemfire-cluster-locator-0  
tid=0x87] Reconnect completed.

[info 2021/06/23 15:58:14.726 GMT gemfire-cluster-locator-1  
tid=0x86] Reconnect completed.

[info 2021/06/23 15:58:46.702 GMT gemfire-cluster-server-0  
tid=0x94] Reconnect completed.

[info 2021/06/23 15:58:46.485 GMT gemfire-cluster-server-1  
tid=0x96] Reconnect completed.

[info 2021/06/23 15:58:46.273 GMT gemfire-cluster-server-2  
tid=0x97] Reconnect completed.

[info 2021/06/23 15:58:46.902 GMT gemfire-cluster-server-3  
tid=0x95] Reconnect completed.
{noformat}
In some rare occasions, though, one of the servers fails during the reconnect 
phase with the following exception:
{noformat}
[error 2021/06/09 18:48:52.872 GMT gemfire-cluster-server-1  
tid=0x91] Cache initialization for GemFireCache[id = 575310555; isClosing = 
false; isShutDownAll = false; created = Wed Jun 09 18:46:49 GMT 2021; server = 
false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed because:
org.apache.geode.GemFireIOException: While starting cache server CacheServer on 
port=40404 client subscription config policy=none client subscription config 
capacity=1 client subscription config overflow directory=.
at 
org.apache.geode.internal.cache.xmlcache.CacheCreation.startCacheServers(CacheCreation.java:800)
at 
org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:599)
at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:339)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4207)
at 
org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1497)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1449)
at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:191)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2668)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2426)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1277)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1183)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1807)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.BindException: Address already in use (Bind failed)
at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
at 
java.base/java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:436)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:395)
at 
org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:70)
at 
org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:529)
at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.(AcceptorImpl.java:573)
at 
org.apache.geode.internal.cache.tier.sockets.AcceptorBuilder.create(AcceptorBuilder.java:291)
at 
org.apache.geode.internal.cache.CacheServerImpl.createAcceptor(CacheServerImpl.java:420)
at 
org.apache.geode.internal.cache.CacheServerImpl.start(CacheServerImpl.java:377)
at 
org.apache.geode.internal.cache.xmlcache.CacheCreation.startCacheServers(CacheCreation.java:796)
... 14 more
{noformat}
It seems that the server is trying to bind the port before the old instance has 
finished shutting down a

[jira] [Updated] (GEODE-9121) Regression Introduced Through GEODE-8905

2021-04-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9121:
--
Attachment: workspace.zip

> Regression Introduced Through GEODE-8905
> 
>
> Key: GEODE-9121
> URL: https://issues.apache.org/jira/browse/GEODE-9121
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.15.0
>Reporter: Juan Ramos
>Priority: Major
> Attachments: workspace.zip
>
>
> The new implementation of the {{JarDeploymentService}} seems to be deleting 
> resources when a member is gracefully shutdown, which in turns generates a 
> race condition if there are functions being executed on the member during 
> that time.
>  In previous versions, a client application would simply retry the operation 
> and no exception or loss of availability would be seen, right now the 
> following exception is thrown on the client instead:
> {noformat}
> Exception in thread "main" org.apache.geode.cache.execute.FunctionException: 
> org.apache.geode.cache.client.ServerOperationException: remote server on 
> 192.168.0.73(3985:loner):49836:c9f57ea7: The function, , has not been 
> registered
> at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:237)
> at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:184)
> at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:388)
> at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:351)
> at test.TestClient.main(TestClient.java:20)
> Caused by: org.apache.geode.cache.client.ServerOperationException: remote 
> server on 192.168.0.73(3985:loner):49836:c9f57ea7: The function, , 
> has not been registered
> at 
> org.apache.geode.cache.client.internal.ExecuteRegionFunctionSingleHopOp$ExecuteRegionFunctionSingleHopOpImpl.processResponse(ExecuteRegionFunctionSingleHopOp.java:370)
> at 
> org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
> at 
> org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
> at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
> at 
> org.apache.geode.cache.client.internal.AbstractOpWithTimeout.attempt(AbstractOpWithTimeout.java:45)
> at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
> at 
> org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
> at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
> at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:335)
> at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:304)
> at 
> org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:840)
> at 
> org.apache.geode.cache.client.internal.SingleHopOperationCallable.call(SingleHopOperationCallable.java:49)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This seems to be a regression introduced through GEODE-8905. I've tested the 
> same scenario with version {{1.13.2}} (released), branch {{support/1.14}} and 
> commit [b80094ec5e|https://github.com/apache/geode/commit/b80094ec5e] with no 
> problems at all. When testing using commit 
> [6f764a7046|https://github.com/apache/geode/commit/6f764a7046], on the other 
> hand, the problem is easily reproducible.
> —
> How to reproduce the issue:
> 1. Download and extract {{workspace.zip}}.
>  2. Execute the {{reproduce.sh}} script and follow the instructions on screen.
> The version of {{Geode}} to use on server side can be changed through the 
> {{GEMFIRE}} variable within the {{reproduce.sh}} script.
>  The version of {{Geode}} to use on client side can be changed through the 
> {{GEODE_VERSION}} variable within the {{launch_client.sh}} script.
> The client application simply executes the {{TestFunction}} forever. When 
> running the scenario using a version of {{Geode}} that doesn't include commit 
> [6f764a7046|https://github.com/apache/geode/commit/6f764a7046], the client 
> simply retries under the hood and no exception is thro

[jira] [Updated] (GEODE-9121) Regression Introduced Through GEODE-8905

2021-04-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9121:
--
Description: 
The new implementation of the {{JarDeploymentService}} seems to be deleting 
resources when a member is gracefully shutdown, which in turns generates a race 
condition if there are functions being executed on the member during that time.
 In previous versions, a client application would simply retry the operation 
and no exception or loss of availability would be seen, right now the following 
exception is thrown on the client instead:
{noformat}
Exception in thread "main" org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.cache.client.ServerOperationException: remote server on 
192.168.0.73(3985:loner):49836:c9f57ea7: The function, , has not been 
registered
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:237)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:184)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:388)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:351)
at test.TestClient.main(TestClient.java:20)
Caused by: org.apache.geode.cache.client.ServerOperationException: remote 
server on 192.168.0.73(3985:loner):49836:c9f57ea7: The function, , has 
not been registered
at 
org.apache.geode.cache.client.internal.ExecuteRegionFunctionSingleHopOp$ExecuteRegionFunctionSingleHopOpImpl.processResponse(ExecuteRegionFunctionSingleHopOp.java:370)
at 
org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
at 
org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
at 
org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
at 
org.apache.geode.cache.client.internal.AbstractOpWithTimeout.attempt(AbstractOpWithTimeout.java:45)
at 
org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
at 
org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:335)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:304)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:840)
at 
org.apache.geode.cache.client.internal.SingleHopOperationCallable.call(SingleHopOperationCallable.java:49)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
This seems to be a regression introduced through GEODE-8905. I've tested the 
same scenario with version {{1.13.2}} (released), branch {{support/1.14}} and 
commit [b80094ec5e|https://github.com/apache/geode/commit/b80094ec5e] with no 
problems at all. When testing using commit 
[6f764a7046|https://github.com/apache/geode/commit/6f764a7046], on the other 
hand, the problem is easily reproducible.

—

How to reproduce the issue:

1. Download and extract {{workspace.zip}}.
 2. Execute the {{reproduce.sh}} script and follow the instructions on screen.

The version of {{Geode}} to use on server side can be changed through the 
{{GEMFIRE}} variable within the {{reproduce.sh}} script.
 The version of {{Geode}} to use on client side can be changed through the 
{{GEODE_VERSION}} variable within the {{launch_client.sh}} script.

The client application simply executes the {{TestFunction}} forever. When 
running the scenario using a version of {{Geode}} that doesn't include commit 
[6f764a7046|https://github.com/apache/geode/commit/6f764a7046], the client 
simply retries under the hood and no exception is thrown. When using the 
current {{develop}} branch, however, an exception is thrown and the client 
application terminates as soon as a server is restarted.

[~ukohlmeyer], [~pjohnson]: I'm tagging you both as you were both working on 
this feature, feel free to assign the ticket to however you consider necessary.

  was:
The new implementation of the {{JarDeploymentService}} seems to be deleting 
resources when a member is gracefully shutdown, which in turns generates a race 
condition if there are functions being executed on the member during that time.
 In previous versi

[jira] [Created] (GEODE-9121) Regression Introduced Through GEODE-8905

2021-04-06 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9121:
-

 Summary: Regression Introduced Through GEODE-8905
 Key: GEODE-9121
 URL: https://issues.apache.org/jira/browse/GEODE-9121
 Project: Geode
  Issue Type: Bug
  Components: client/server
Affects Versions: 1.15.0
Reporter: Juan Ramos


The new implementation of the {{JarDeploymentService}} seems to be deleting 
resources when a member is gracefully shutdown, which in turns generates a race 
condition if there are functions being executed on the member during that time.
 In previous versions, a client application would simply retry the operation 
and no exception or loss of availability would be seen, right now the following 
exception is thrown on the client instead:
{noformat}
Exception in thread "main" org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.cache.client.ServerOperationException: remote server on 
192.168.0.73(3985:loner):49836:c9f57ea7: The function, , has not been 
registered
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:237)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:184)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:388)
at 
org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:351)
at test.TestClient.main(TestClient.java:20)
Caused by: org.apache.geode.cache.client.ServerOperationException: remote 
server on 192.168.0.73(3985:loner):49836:c9f57ea7: The function, , has 
not been registered
at 
org.apache.geode.cache.client.internal.ExecuteRegionFunctionSingleHopOp$ExecuteRegionFunctionSingleHopOpImpl.processResponse(ExecuteRegionFunctionSingleHopOp.java:370)
at 
org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
at 
org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
at 
org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
at 
org.apache.geode.cache.client.internal.AbstractOpWithTimeout.attempt(AbstractOpWithTimeout.java:45)
at 
org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
at 
org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:335)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:304)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:840)
at 
org.apache.geode.cache.client.internal.SingleHopOperationCallable.call(SingleHopOperationCallable.java:49)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
This seems to be a regression introduced through GEODE-8905. I've tested the 
same scenario with version {{1.13.2}} (released), branch {{support/1.14}} and 
commit [b80094ec5e|https://github.com/apache/geode/commit/b80094ec5e] with no 
problems at all. When testing using commit 
[6f764a7046|https://github.com/apache/geode/commit/6f764a7046], on the other 
hand, the problem is easily reproducible.

—

How to reproduce the issue:

1. Download and extract {{workspace.zip}}.
 2. Execute the {{reproduce.sh}} script and follow the instructions on screen.

The version of {{Geode}} to use on server side can be changed through the 
{{GEMFIRE}} variable within the {{reproduce.sh}} script.
 The version of {{Geode}} to use on client side can be changed through the 
{{GEODE_VERSION}} variable within the {{launch_client.sh}} script.

The client application simply executes the {{TestFunction}} forever. When 
running the scenario using a version of {{Geode}} that doesn't include commit 
[b80094ec5e|https://github.com/apache/geode/commit/b80094ec5e], the client 
simply retries under the hood and no exception is thrown. When using the 
current {{develop}} branch, however, an exception is thrown and the client 
application terminates as soon as a server is restarted.

[~ukohlmeyer], [~pjohnson]: I'm tagging you both as you were both working on 
this feature, feel free to assign the ticket to however you consider necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9000) NPE During Reconnect After Network Split

2021-03-04 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9000:
--
Labels:   (was: blocks-1.14.0​)

> NPE During Reconnect After Network Split
> 
>
> Key: GEODE-9000
> URL: https://issues.apache.org/jira/browse/GEODE-9000
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.14.0
>Reporter: Juan Ramos
>Priority: Major
>
> During a full network split when all members get shutdown by a partition, one 
> of the servers continually fails to reconnect due to a 
> {{NullPointerException}}. When using persistent regions, this also prevents 
> the remaining members from correctly start up as they might be waiting for 
> the stuck member to recover the latest data.
> The issue itself has been introduced by the fix for GEODE-8901, the new 
> implementation for {{GMSJoinLeave.processNetworkPartitionMessage}} doesn't 
> have a {{currentView}} installed during the reconnect phase ({{getView() == 
> null}}) and the following is shown in the logs:
> {noformat}
> [fatal 2021/03/04 03:32:02.744 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected exception while booting membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1239)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1951)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> [error 2021/03/04 03:32:02.747 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected problem starting up membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
>

[jira] [Updated] (GEODE-9000) NPE During Reconnect After Network Split

2021-03-04 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9000:
--
Priority: Blocker  (was: Major)

> NPE During Reconnect After Network Split
> 
>
> Key: GEODE-9000
> URL: https://issues.apache.org/jira/browse/GEODE-9000
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.14.0
>Reporter: Juan Ramos
>Priority: Blocker
>
> During a full network split when all members get shutdown by a partition, one 
> of the servers continually fails to reconnect due to a 
> {{NullPointerException}}. When using persistent regions, this also prevents 
> the remaining members from correctly start up as they might be waiting for 
> the stuck member to recover the latest data.
> The issue itself has been introduced by the fix for GEODE-8901, the new 
> implementation for {{GMSJoinLeave.processNetworkPartitionMessage}} doesn't 
> have a {{currentView}} installed during the reconnect phase ({{getView() == 
> null}}) and the following is shown in the logs:
> {noformat}
> [fatal 2021/03/04 03:32:02.744 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected exception while booting membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1239)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1951)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> [error 2021/03/04 03:32:02.747 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected problem starting up membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 

[jira] [Updated] (GEODE-9000) NPE During Reconnect After Network Split

2021-03-04 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9000:
--
Labels: blocks-1.14.0​  (was: )

> NPE During Reconnect After Network Split
> 
>
> Key: GEODE-9000
> URL: https://issues.apache.org/jira/browse/GEODE-9000
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.14.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: blocks-1.14.0​
>
> During a full network split when all members get shutdown by a partition, one 
> of the servers continually fails to reconnect due to a 
> {{NullPointerException}}. When using persistent regions, this also prevents 
> the remaining members from correctly start up as they might be waiting for 
> the stuck member to recover the latest data.
> The issue itself has been introduced by the fix for GEODE-8901, the new 
> implementation for {{GMSJoinLeave.processNetworkPartitionMessage}} doesn't 
> have a {{currentView}} installed during the reconnect phase ({{getView() == 
> null}}) and the following is shown in the logs:
> {noformat}
> [fatal 2021/03/04 03:32:02.744 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected exception while booting membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1239)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1951)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> [error 2021/03/04 03:32:02.747 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected problem starting up membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(Di

[jira] [Updated] (GEODE-9000) NPE During Reconnect After Network Split

2021-03-04 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-9000:
--
Priority: Major  (was: Blocker)

> NPE During Reconnect After Network Split
> 
>
> Key: GEODE-9000
> URL: https://issues.apache.org/jira/browse/GEODE-9000
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.14.0
>Reporter: Juan Ramos
>Priority: Major
>
> During a full network split when all members get shutdown by a partition, one 
> of the servers continually fails to reconnect due to a 
> {{NullPointerException}}. When using persistent regions, this also prevents 
> the remaining members from correctly start up as they might be waiting for 
> the stuck member to recover the latest data.
> The issue itself has been introduced by the fix for GEODE-8901, the new 
> implementation for {{GMSJoinLeave.processNetworkPartitionMessage}} doesn't 
> have a {{currentView}} installed during the reconnect phase ({{getView() == 
> null}}) and the following is shown in the logs:
> {noformat}
> [fatal 2021/03/04 03:32:02.744 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected exception while booting membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1239)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1951)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> [error 2021/03/04 03:32:02.747 GMT gemfire-cluster-server-0  
> tid=0x8a] Unexpected problem starting up membership services
> java.lang.NullPointerException
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
>   at 
>

[jira] [Created] (GEODE-9000) NPE During Reconnect After Network Split

2021-03-04 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-9000:
-

 Summary: NPE During Reconnect After Network Split
 Key: GEODE-9000
 URL: https://issues.apache.org/jira/browse/GEODE-9000
 Project: Geode
  Issue Type: Bug
  Components: membership
Affects Versions: 1.14.0
Reporter: Juan Ramos


During a full network split when all members get shutdown by a partition, one 
of the servers continually fails to reconnect due to a 
{{NullPointerException}}. When using persistent regions, this also prevents the 
remaining members from correctly start up as they might be waiting for the 
stuck member to recover the latest data.
The issue itself has been introduced by the fix for GEODE-8901, the new 
implementation for {{GMSJoinLeave.processNetworkPartitionMessage}} doesn't have 
a {{currentView}} installed during the reconnect phase ({{getView() == null}}) 
and the following is shown in the logs:

{noformat}
[fatal 2021/03/04 03:32:02.744 GMT gemfire-cluster-server-0  
tid=0x8a] Unexpected exception while booting membership services
java.lang.NullPointerException
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
at 
org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
at 
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
at 
org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1239)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1951)
at java.base/java.lang.Thread.run(Thread.java:834)

[error 2021/03/04 03:32:02.747 GMT gemfire-cluster-server-0  
tid=0x8a] Unexpected problem starting up membership services
java.lang.NullPointerException
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processNetworkPartitionMessage(GMSJoinLeave.java:1459)
at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1343)
at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.started(JGroupsMessenger.java:428)
at 
org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:210)
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1782)
at 
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
at 
org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.create(Cluste

[jira] [Assigned] (GEODE-7685) Add backward compatibility tests for PartitionedRegion clear

2021-02-03 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-7685:
-

Assignee: (was: Juan Ramos)

> Add backward compatibility tests for PartitionedRegion clear
> 
>
> Key: GEODE-7685
> URL: https://issues.apache.org/jira/browse/GEODE-7685
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Reporter: Nabarun Nag
>Priority: Major
>  Labels: GeodeCommons, GeodeOperationAPI, pull-request-available
>
> Partitioned region clear must gracefully reject the operation request if 
> there are older version of Apache Geode are present in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8191) MemberMXBeanDistributedTest.testBucketCount fails intermittently

2020-08-07 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8191.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> MemberMXBeanDistributedTest.testBucketCount fails intermittently
> 
>
> Key: GEODE-8191
> URL: https://issues.apache.org/jira/browse/GEODE-8191
> Project: Geode
>  Issue Type: Bug
>  Components: jmx, tests
>Reporter: Kirk Lund
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: flaky, pull-request-available
> Fix For: 1.14.0
>
>
> This appears to be a flaky test related to GEODE-7963 which was resolved by 
> Mario Ivanac so I've assigned the ticket to him.
> {noformat}
> org.apache.geode.management.MemberMXBeanDistributedTest > testBucketCount 
> FAILED
> org.awaitility.core.ConditionTimeoutException: Assertion condition 
> defined as a lambda expression in 
> org.apache.geode.management.MemberMXBeanDistributedTest Expected bucket count 
> is 4000, and actual count is 3750 expected:<3750> but was:<4000> within 5 
> minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
> at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
> at 
> org.apache.geode.management.MemberMXBeanDistributedTest.testBucketCount(MemberMXBeanDistributedTest.java:102)
> Caused by:
> java.lang.AssertionError: Expected bucket count is 4000, and actual 
> count is 3750 expected:<3750> but was:<4000>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at 
> org.apache.geode.management.MemberMXBeanDistributedTest.lambda$testBucketCount$1(MemberMXBeanDistributedTest.java:107)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8374) ViewAckTimeout Configuration

2020-08-05 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171332#comment-17171332
 ] 

Juan Ramos commented on GEODE-8374:
---

Hello [~amitchrs8],

I'm not in control of the Geode project schedule, neither the priority of the 
tickets and how they are assigned. I'd suggest you send an email to the geode 
users list stating your use case and the actual business impact so the 
developers can weight the issue and prioritise it accordingly. 

> ViewAckTimeout Configuration
> 
>
> Key: GEODE-8374
> URL: https://issues.apache.org/jira/browse/GEODE-8374
> Project: Geode
>  Issue Type: Bug
>  Components: docs, membership
>Reporter: Juan Ramos
>Priority: Minor
>
> We have the following within our docs (point 4 
> [here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
> {noformat}
> In the first phase, the membership coordinator sends out a view preparation 
> message to all members and waits 12 seconds for a view preparation ack return 
> message from each member. If the coordinator does not receive an ack message 
> from a member within 12 seconds, the coordinator attempts to connect to the 
> member’s failure-detection socket. If the coordinator cannot connect to the 
> member’s failure-detection socket, the coordinator declares the member dead 
> and starts the membership view protocol again from the beginning.
> {noformat}
> These 12 seconds refer to {{viewAckTimeout}} property within the 
> {{GMSJoinLeave}} class, and it’s calculated as follows:
> {code:java|title=GMSJoinLeave.java|borderStyle=solid}
> long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
> if (ackCollectionTimeout < 1500) {
>   ackCollectionTimeout = 1500;
> } else if (ackCollectionTimeout > 12437) {
>   ackCollectionTimeout = 12437;
> }
> ackCollectionTimeout = Long
> .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
> ackCollectionTimeout)
> .longValue();
> this.viewAckTimeout = ackCollectionTimeout;
> {code}
> So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
> seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, 
> unless the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system 
> property (for which I haven't found any tests nor anything related, meaning 
> that _*it shouldn't be used at all as we don't know what the negative 
> implications - if any - might be*_).
>  We should either remove the internal check and allow the user to fully 
> configure this property ({{member-timeout * 2}} by default) or add better 
> documentation about this internal timeout and why it shouldn't be changed 
> outside of the fixed interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8376) PartitionRegion.clear should fail when older members host the region to be cleared

2020-07-22 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8376:
-

 Summary: PartitionRegion.clear should fail when older members host 
the region to be cleared
 Key: GEODE-8376
 URL: https://issues.apache.org/jira/browse/GEODE-8376
 Project: Geode
  Issue Type: Sub-task
  Components: regions
Reporter: Juan Ramos


The {{PartitionClear.clear()}} operation should be smart enough and fail fast 
whenever there are old members (that is, members running versions older than 
the version on which the {{clear}} operation is released) within the 
distributed system hosting the actual region that has to be cleared.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8374) ViewAckTimeout Configuration

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8374:
--
Description: 
We have the following within our docs (point 4 
[here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
{noformat}
In the first phase, the membership coordinator sends out a view preparation 
message to all members and waits 12 seconds for a view preparation ack return 
message from each member. If the coordinator does not receive an ack message 
from a member within 12 seconds, the coordinator attempts to connect to the 
member’s failure-detection socket. If the coordinator cannot connect to the 
member’s failure-detection socket, the coordinator declares the member dead and 
starts the membership view protocol again from the beginning.
{noformat}
These 12 seconds refer to {{viewAckTimeout}} property within the 
{{GMSJoinLeave}} class, and it’s calculated as follows:
{code:java|title=GMSJoinLeave.java|borderStyle=solid}
long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
if (ackCollectionTimeout < 1500) {
  ackCollectionTimeout = 1500;
} else if (ackCollectionTimeout > 12437) {
  ackCollectionTimeout = 12437;
}
ackCollectionTimeout = Long
.getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
ackCollectionTimeout)
.longValue();
this.viewAckTimeout = ackCollectionTimeout;
{code}
So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, unless 
the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system property (for 
which I haven't found any tests nor anything related, meaning that _*it 
shouldn't be used at all as we don't know what the negative implications - if 
any - might be*_).
 We should either remove the internal check and allow the user to fully 
configure this property ({{member-timeout * 2}} by default) or add better 
documentation about this internal timeout and why it shouldn't be changed 
outside of the fixed interval.

  was:
We have the following within our docs (point 4 
[here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):

{noformat}
In the first phase, the membership coordinator sends out a view preparation 
message to all members and waits 12 seconds for a view preparation ack return 
message from each member. If the coordinator does not receive an ack message 
from a member within 12 seconds, the coordinator attempts to connect to the 
member’s failure-detection socket. If the coordinator cannot connect to the 
member’s failure-detection socket, the coordinator declares the member dead and 
starts the membership view protocol again from the beginning.
{noformat}

These 12 seconds refer to {{viewAckTimeout}} property within the 
{{GMSJoinLeave}} class, and it’s calculated as follows:
{code:title=GMSJoinLeave.java|borderStyle=solid}
long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
if (ackCollectionTimeout < 1500) {
  ackCollectionTimeout = 1500;
} else if (ackCollectionTimeout > 12437) {
  ackCollectionTimeout = 12437;
}
ackCollectionTimeout = Long
.getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
ackCollectionTimeout)
.longValue();
this.viewAckTimeout = ackCollectionTimeout;
{code}

So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, unless 
the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system property (for 
which I haven't found any tests nor anything related, meaning that _*it 
shouldn't be used at all as we don't know what the negative implications - if 
any - are*_).
We should either remove the internal check and allow the user to fully 
configure this property ({{member-timeout * 2}} by default) or add better 
documentation about this internal timeout and why it shouldn't be changed 
outside of the fixed interval.


> ViewAckTimeout Configuration
> 
>
> Key: GEODE-8374
> URL: https://issues.apache.org/jira/browse/GEODE-8374
> Project: Geode
>  Issue Type: Bug
>  Components: docs, membership
>Reporter: Juan Ramos
>Priority: Minor
>
> We have the following within our docs (point 4 
> [here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
> {noformat}
> In the first phase, the membership coordinator sends out a view preparation 
> message to all members and waits 12 seconds for a view preparation ack return 
> message from each member. If the coordinator does not receive an ack message 
> from a membe

[jira] [Updated] (GEODE-8374) ViewAckTimeout Configuration

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8374:
--
Priority: Minor  (was: Major)

> ViewAckTimeout Configuration
> 
>
> Key: GEODE-8374
> URL: https://issues.apache.org/jira/browse/GEODE-8374
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Juan Ramos
>Priority: Minor
>
> We have the following within our docs (point 4 
> [here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
> {noformat}
> In the first phase, the membership coordinator sends out a view preparation 
> message to all members and waits 12 seconds for a view preparation ack return 
> message from each member. If the coordinator does not receive an ack message 
> from a member within 12 seconds, the coordinator attempts to connect to the 
> member’s failure-detection socket. If the coordinator cannot connect to the 
> member’s failure-detection socket, the coordinator declares the member dead 
> and starts the membership view protocol again from the beginning.
> {noformat}
> These 12 seconds refer to {{viewAckTimeout}} property within the 
> {{GMSJoinLeave}} class, and it’s calculated as follows:
> {code:title=GMSJoinLeave.java|borderStyle=solid}
> long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
> if (ackCollectionTimeout < 1500) {
>   ackCollectionTimeout = 1500;
> } else if (ackCollectionTimeout > 12437) {
>   ackCollectionTimeout = 12437;
> }
> ackCollectionTimeout = Long
> .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
> ackCollectionTimeout)
> .longValue();
> this.viewAckTimeout = ackCollectionTimeout;
> {code}
> So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
> seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, 
> unless the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system 
> property (for which I haven't found any tests nor anything related, meaning 
> that _*it shouldn't be used at all as we don't know what the negative 
> implications - if any - are*_).
> We should either remove the internal check and allow the user to fully 
> configure this property ({{member-timeout * 2}} by default) or add better 
> documentation about this internal timeout and why it shouldn't be changed 
> outside of the fixed interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8374) ViewAckTimeout Configuration

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8374:
--
Component/s: docs

> ViewAckTimeout Configuration
> 
>
> Key: GEODE-8374
> URL: https://issues.apache.org/jira/browse/GEODE-8374
> Project: Geode
>  Issue Type: Bug
>  Components: docs, membership
>Reporter: Juan Ramos
>Priority: Minor
>
> We have the following within our docs (point 4 
> [here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
> {noformat}
> In the first phase, the membership coordinator sends out a view preparation 
> message to all members and waits 12 seconds for a view preparation ack return 
> message from each member. If the coordinator does not receive an ack message 
> from a member within 12 seconds, the coordinator attempts to connect to the 
> member’s failure-detection socket. If the coordinator cannot connect to the 
> member’s failure-detection socket, the coordinator declares the member dead 
> and starts the membership view protocol again from the beginning.
> {noformat}
> These 12 seconds refer to {{viewAckTimeout}} property within the 
> {{GMSJoinLeave}} class, and it’s calculated as follows:
> {code:title=GMSJoinLeave.java|borderStyle=solid}
> long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
> if (ackCollectionTimeout < 1500) {
>   ackCollectionTimeout = 1500;
> } else if (ackCollectionTimeout > 12437) {
>   ackCollectionTimeout = 12437;
> }
> ackCollectionTimeout = Long
> .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
> ackCollectionTimeout)
> .longValue();
> this.viewAckTimeout = ackCollectionTimeout;
> {code}
> So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
> seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, 
> unless the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system 
> property (for which I haven't found any tests nor anything related, meaning 
> that _*it shouldn't be used at all as we don't know what the negative 
> implications - if any - are*_).
> We should either remove the internal check and allow the user to fully 
> configure this property ({{member-timeout * 2}} by default) or add better 
> documentation about this internal timeout and why it shouldn't be changed 
> outside of the fixed interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8374) ViewAckTimeout Configuration

2020-07-22 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8374:
-

 Summary: ViewAckTimeout Configuration
 Key: GEODE-8374
 URL: https://issues.apache.org/jira/browse/GEODE-8374
 Project: Geode
  Issue Type: Bug
  Components: membership
Reporter: Juan Ramos


We have the following within our docs (point 4 
[here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):

{noformat}
In the first phase, the membership coordinator sends out a view preparation 
message to all members and waits 12 seconds for a view preparation ack return 
message from each member. If the coordinator does not receive an ack message 
from a member within 12 seconds, the coordinator attempts to connect to the 
member’s failure-detection socket. If the coordinator cannot connect to the 
member’s failure-detection socket, the coordinator declares the member dead and 
starts the membership view protocol again from the beginning.
{noformat}

These 12 seconds refer to {{viewAckTimeout}} property within the 
{{GMSJoinLeave}} class, and it’s calculated as follows:
{code:title=GMSJoinLeave.java|borderStyle=solid}
long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 1;
if (ackCollectionTimeout < 1500) {
  ackCollectionTimeout = 1500;
} else if (ackCollectionTimeout > 12437) {
  ackCollectionTimeout = 12437;
}
ackCollectionTimeout = Long
.getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
ackCollectionTimeout)
.longValue();
this.viewAckTimeout = ackCollectionTimeout;
{code}

So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, unless 
the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system property (for 
which I haven't found any tests nor anything related, meaning that _*it 
shouldn't be used at all as we don't know what the negative implications - if 
any - are*_).
We should either remove the internal check and allow the user to fully 
configure this property ({{member-timeout * 2}} by default) or add better 
documentation about this internal timeout and why it shouldn't be changed 
outside of the fixed interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-7670) Partitioned Region clear operations can occur during concurrent data operations

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-7670.
---
Resolution: Fixed

> Partitioned Region clear operations can occur during concurrent data 
> operations
> ---
>
> Key: GEODE-7670
> URL: https://issues.apache.org/jira/browse/GEODE-7670
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Reporter: Nabarun Nag
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Clear operations are successful when concurrent read/write operations occur. 
> Ensure there are test coverage for this use case and modify the code needed 
> to enable this.
> Acceptance :
>  * Passing DUnit tests where clear operations are successful on partitioned 
> region with 
>  * concurrent puts (writes) and clear op
>  * concurrent gets (reads) and clear op
>  * Test coverage to when a member departs in this scenario
>  * Test coverage to when a member restarts in this scenario
>  * Unit tests with complete code coverage for the newly written code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-7912) cacheWriter should be triggered when PR.clear

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-7912:
--
Fix Version/s: (was: 1.13.0)

> cacheWriter should be triggered when PR.clear
> -
>
> Key: GEODE-7912
> URL: https://issues.apache.org/jira/browse/GEODE-7912
> Project: Geode
>  Issue Type: Improvement
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeCommons
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> If server configured cacheWriter, PR.clear should trigger it the same way as 
> PR.destroyRegion does. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-7983) Clear region writer callbacks should not be invoked for bucket regions

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-7983:
--
Fix Version/s: (was: 1.13.0)

> Clear region writer callbacks should not be invoked for bucket regions
> --
>
> Key: GEODE-7983
> URL: https://issues.apache.org/jira/browse/GEODE-7983
> Project: Geode
>  Issue Type: Improvement
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeCommons
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Region destroy will not trigger cacheWriter for bucket region. we should keep 
> the same behavior for clear. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-22 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8334:
--
Fix Version/s: (was: 1.14.0)

> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8368) Upgrade ClassGraph dependency from 4.8.52 to 4.8.87

2020-07-17 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160033#comment-17160033
 ] 

Juan Ramos commented on GEODE-8368:
---

Hello [~klund], just FYI, we had some performance issues with versions of 
{{classgraph}} greater than {{4.8.52}}, that's why the version itself was 
downgraded. More details in GEODE-8150.

> Upgrade ClassGraph dependency from 4.8.52 to 4.8.87
> ---
>
> Key: GEODE-8368
> URL: https://issues.apache.org/jira/browse/GEODE-8368
> Project: Geode
>  Issue Type: Wish
>  Components: build, management
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade ClassGraph dependency from 4.8.52 to 4.8.87.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8361) Incorrect Bucket Count Warning Message Shown

2020-07-15 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8361:
--
Priority: Trivial  (was: Major)

> Incorrect Bucket Count Warning Message Shown
> 
>
> Key: GEODE-8361
> URL: https://issues.apache.org/jira/browse/GEODE-8361
> Project: Geode
>  Issue Type: Sub-task
>  Components: logging
>Reporter: Juan Ramos
>Priority: Trivial
>
> While analysing some failures related to GEODE-7670, I've noticed that 
> sometimes we report an incorrect bucket count within the warning message 
> logged when the clear didn't complete successfully that could confuse our 
> users.
> For this test the partition region always has 13 buckets so, as I user, I 
> would never expect to see a bucket count higher than 13 in my logs (no matter 
> how many redundant copies I have).
> ---
> Below are some examples:
> {noformat}
> [vm1] [warn 2020/07/15 11:56:17.739 GMT  Connection(5)-172.17.0.5> tid=0x5f] Unable to clear all the buckets from 
> the partitioned region PartitionedRegion, either data (buckets) moved or 
> member departed. expected to clear number of buckets: 13 actual cleared: 26
> [vm1] [warn 2020/07/15 11:57:48.403 GMT  Connection(6)-172.17.0.9> tid=0x10f] Unable to clear all the buckets from 
> the partitioned region PartitionedRegion, either data (buckets) moved or 
> member departed. expected to clear number of buckets: 13 actual cleared: 14
> [vm0] [warn 2020/07/15 12:07:36.227 GMT  Connection(32)-172.17.0.25> tid=0x1fe] Unable to clear all the buckets 
> from the partitioned region PartitionedRegion, either data (buckets) moved or 
> member departed. expected to clear number of buckets: 13 actual cleared: 19
> [vm0] [warn 2020/07/15 12:08:56.277 GMT  Connection(37)-172.17.0.24> tid=0x2a2] Unable to clear all the buckets 
> from the partitioned region PartitionedRegion, either data (buckets) moved or 
> member departed. expected to clear number of buckets: 13 actual cleared: 16
> {noformat}
> The full set of artefacts and results:
> {noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4848/test-results/repeatTest/1594816968/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4848/test-artifacts/1594816968/stressnewtestfiles-geode-pr-4848.tgz
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8361) Incorrect Bucket Count Warning Message Shown

2020-07-15 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8361:
-

 Summary: Incorrect Bucket Count Warning Message Shown
 Key: GEODE-8361
 URL: https://issues.apache.org/jira/browse/GEODE-8361
 Project: Geode
  Issue Type: Sub-task
  Components: logging
Reporter: Juan Ramos


While analysing some failures related to GEODE-7670, I've noticed that 
sometimes we report an incorrect bucket count within the warning message logged 
when the clear didn't complete successfully that could confuse our users.
For this test the partition region always has 13 buckets so, as I user, I would 
never expect to see a bucket count higher than 13 in my logs (no matter how 
many redundant copies I have).

---

Below are some examples:
{noformat}
[vm1] [warn 2020/07/15 11:56:17.739 GMT  tid=0x5f] Unable to clear all the buckets from the 
partitioned region PartitionedRegion, either data (buckets) moved or member 
departed. expected to clear number of buckets: 13 actual cleared: 26

[vm1] [warn 2020/07/15 11:57:48.403 GMT  tid=0x10f] Unable to clear all the buckets from 
the partitioned region PartitionedRegion, either data (buckets) moved or member 
departed. expected to clear number of buckets: 13 actual cleared: 14

[vm0] [warn 2020/07/15 12:07:36.227 GMT  tid=0x1fe] Unable to clear all the buckets from 
the partitioned region PartitionedRegion, either data (buckets) moved or member 
departed. expected to clear number of buckets: 13 actual cleared: 19

[vm0] [warn 2020/07/15 12:08:56.277 GMT  tid=0x2a2] Unable to clear all the buckets from 
the partitioned region PartitionedRegion, either data (buckets) moved or member 
departed. expected to clear number of buckets: 13 actual cleared: 16
{noformat}

The full set of artefacts and results:
{noformat}
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4848/test-results/repeatTest/1594816968/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4848/test-artifacts/1594816968/stressnewtestfiles-geode-pr-4848.tgz
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-07-03 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8029:
--
Fix Version/s: 1.13.0
   1.12.1

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.12.1, 1.13.0, 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-07-02 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8029.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-07-02 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150180#comment-17150180
 ] 

Juan Ramos commented on GEODE-8029:
---

{quote}
Juan, I deleted the content inside the diskstore and restarted the servers.
{quote}

Thanks for the update [~jagan23527001], we have a potential fix for the issue 
that will be merged into {{develop}} shortly.
Once that's done, I'll back port the changes to the {{support/1.13}} branch, 
which is what we'll use to ship the {{1.13}} release in the coming weeks.


> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)

[jira] [Created] (GEODE-8325) Remove Unused DRF Files More Frequently

2020-07-02 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8325:
-

 Summary: Remove Unused DRF Files More Frequently
 Key: GEODE-8325
 URL: https://issues.apache.org/jira/browse/GEODE-8325
 Project: Geode
  Issue Type: Improvement
  Components: persistence
Reporter: Juan Ramos


Users have reported that, even with {{auto-compaction}} enabled, there are 
certain scenarios for which there are too many orphaned {{drf}} files undeleted 
on the {{disk-stores}}.
This causes slowness and memory spikes during the startup (we iterate over all 
{{drf}} files and load records into memory as the first step when recovering 
disk regions) and, prior to the fix for GEODE-8029, could also cause a member 
to fail and shutdown.
We should implement a more long term fix, and make sure unused {{drf}} files 
are removed more often (maybe have a dedicated {{OplogCompactor}} that 
frequently removes the {{drf}} files even when the {{threshold}} is not hit?) 

It would also be good to apply Darrel's suggestions:
{quote}
We should consider a more long term fix. We used to have code that would remove 
drfs that were no longer needed. If we could do that more periodically it seems 
like it would prevent a large number of drfs from building up.
Also it seems like the OplogEntryIdSet itself could be reimplemented to use 
less memory. Since the ids for an diskstore always start at 1 and then simply 
increase, and many use cases will end up with large sequences of ids that have 
been deleted, we could consider using algorithms like we have on our 
concurrency check code for tracking what version ids we have seen. Basically 
you can just have a sequence (start of sequence and length) which can represent 
a long sequence with just 2 32-bit values (or 2 64-bit values if longs). 
Anything in the sequence range that is still alive you would note in another 
data structure as an exception. If exceptions are rare then this can save lots 
of memory.
{quote}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-30 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8029:
--
Fix Version/s: (was: 1.14.0)
   (was: 1.12.1)
   (was: 1.13.0)

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-30 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reopened GEODE-8029:
---

Re-opening the ticket as the fix has proven to be insufficient to solve the 
problem.

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.12.1, 1.13.0, 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8176) CI Failure: ClientServerMiscBCDUnitTest > testPingWrongServer[1]

2020-06-29 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8176.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> CI Failure: ClientServerMiscBCDUnitTest > testPingWrongServer[1] 
> -
>
> Key: GEODE-8176
> URL: https://issues.apache.org/jira/browse/GEODE-8176
> Project: Geode
>  Issue Type: Bug
>Reporter: Donal Evans
>Assignee: Alberto Bustamante Reyes
>Priority: Major
>  Labels: flaky
> Fix For: 1.14.0
>
>
> Failed in 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/192#A
> {noformat}
> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscBCDUnitTest > 
> testPingWrongServer[1] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscDUnitTestBase$$Lambda$310/201549247.run
>  in VM 3 running on Host c0a964e32781 with 5 VMs
> Caused by:
> org.junit.ComparisonFailure: expected:<[tru]e> but was:<[fals]e>
> {noformat}
> I ran the test 200 times locally with no failure, so this is possibly just a 
> blip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-26 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146413#comment-17146413
 ] 

Juan Ramos commented on GEODE-8029:
---

Hello [~jagan23527001],
This issue is proving to be quite evasive to fully solve, the fix I've checked 
in is certainly not complete and the issue might still happen under certain 
conditions, so I'm trying to understand why/how the old {{drf}} files were not 
compacted/deleted for your use case.
I will certainly understand if you answer "no" to my question but, if you still 
have a copy of the {{disk-store}} when the issue happened, would you be able to 
share it with us so we can keep troubleshooting the root cause?.

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.st

[jira] [Created] (GEODE-8248) Member hangs waiting for missing disk-stores after gfsh shutdown

2020-06-15 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8248:
-

 Summary: Member hangs waiting for missing disk-stores after gfsh 
shutdown
 Key: GEODE-8248
 URL: https://issues.apache.org/jira/browse/GEODE-8248
 Project: Geode
  Issue Type: Bug
  Components: gfsh, persistence
Reporter: Juan Ramos
 Attachments: temporal.zip

Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and I 
stop both using the {{gfsh shutdown}} command.
According to the 
[documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html],
 I should be able to start either of the servers without any problems as both 
host the most up to date data. However, what happens in reality is that the 
startup hangs with the following:
{noformat}
(1) Executing - start server --name=server1 --locators=localhost[10334] 
--server-port=40401 --cache-xml-file=/temporal/cache.xml

.
Region /TestRegion has potentially stale data. It is waiting for another member 
to recover the latest data.
My persistent id:

  DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
  Name: server1
  Location: /temporal/server1/dataStore

Members with potentially new data:
[
  DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
  Name: server2
  Location: /temporal/server2/dataStore
]


"main" #1 prio=5 os_prio=31 tid=0x7f9b28809000 nid=0x1003 in Object.wait() 
[0x7ab04000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
- locked <0x000719df55e0> (a 
org.apache.geode.internal.cache.persistence.MembershipChangeListener)
at 
org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
at 
org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
at 
org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
at 
org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
at 
org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
- locked <0x0005c0593168> (a 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
- locked <0x0005c016a108> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
- locked <0x0005c0043de0> (a java.lang.Class for 
org.apache.geode.internal.cache.InternalCacheBuilder)
at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
- locked <0x0005c0043de0> (a java.lang.Class for 
org.apache.geode.internal.cache.InternalCacheBuilder)
at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
at 
org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
at 
org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
at 
org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
at 
org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
at 
org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
{noformat}

We should either fix the problem and make sure the members fully synchronise 
their data during the {{shutdown}} process so they don't have to wait on each 
other or, if th

[jira] [Updated] (GEODE-8248) Member hangs waiting for missing disk-stores after gfsh shutdown

2020-06-15 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8248:
--
Attachment: temporal.zip

> Member hangs waiting for missing disk-stores after gfsh shutdown
> 
>
> Key: GEODE-8248
> URL: https://issues.apache.org/jira/browse/GEODE-8248
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, persistence
>Reporter: Juan Ramos
>Priority: Major
> Attachments: temporal.zip
>
>
> Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and 
> I stop both using the {{gfsh shutdown}} command.
> According to the 
> [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html],
>  I should be able to start either of the servers without any problems as both 
> host the most up to date data. However, what happens in reality is that the 
> startup hangs with the following:
> {noformat}
> (1) Executing - start server --name=server1 --locators=localhost[10334] 
> --server-port=40401 --cache-xml-file=/temporal/cache.xml
> .
> Region /TestRegion has potentially stale data. It is waiting for another 
> member to recover the latest data.
> My persistent id:
>   DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
>   Name: server1
>   Location: /temporal/server1/dataStore
> Members with potentially new data:
> [
>   DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
>   Name: server2
>   Location: /temporal/server2/dataStore
> ]
> "main" #1 prio=5 os_prio=31 tid=0x7f9b28809000 nid=0x1003 in 
> Object.wait() [0x7ab04000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
>   - locked <0x000719df55e0> (a 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener)
>   at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
>   at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
>   at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
>   at 
> org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
>   at 
> org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
>   at 
> org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
>   - locked <0x0005c0593168> (a 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
>   at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
>   at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
>   - locked <0x0005c016a108> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   - locked <0x0005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>   at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
>   - locked <0x0005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>   at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
>   at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
>   at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
>   at 
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
>   at 
> 

[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-08 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128521#comment-17128521
 ] 

Juan Ramos commented on GEODE-8029:
---

Hello [~jagan23527001],

Geode ships minor releases on a quarterly basis (see 
[here|https://cwiki.apache.org/confluence/display/GEODE/Shipping+patch+releases]
 for further details) and the release branch for version {{1.13}} has been cut 
recently, so I don't think {{1.14}} will be released until, at least, September.
That said, I'll propose on the {{dev}} list to back port this fix, if there are 
enough votes it should be included in {{1.13}} (will be released soon) as well.
Cheers.

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at o

[jira] [Resolved] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-08 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8029.
---
Resolution: Fixed

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-06-08 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8029:
--
Fix Version/s: 1.14.0

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked

2020-05-27 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117767#comment-17117767
 ] 

Juan Ramos commented on GEODE-8119:
---

Hello [~mkevo],

I've been finally able to reproduce the issue using a {{DistributedTest}}. The 
test basically makes sure that a member using a newer (current) version of 
Geode can start up using a {{disk-store}} created using an older version 
({{1.12.0}} in this case); it's basically a regular upgrade test scenario.
 The test succeeds without your changes for {{GEODE-8119}}, and fails with the 
following exception after merging the changes:
{noformat}
Command result for :
 
_ __
   / _/ __/ __/ // /
  / /  __/ /___  /_  / _  / 
 / /__/ / /  _/ / // /  
/__/_/  /__/_//_/1.14.0-build.0

Monitor and Manage Apache Geode
Could not find: 
"/var/folders/d7/r6jcs9m15q9_s_tfzy4tvbjmgn/T/junit853175387123885355/diskDir2/BACKUPtestDisk.if,
 
/var/folders/d7/r6jcs9m15q9_s_tfzy4tvbjmgn/T/junit853175387123885355/diskDir3/BACKUPtestDisk.if"
{noformat}
 
I'm pretty sure the problem is related to the changes in 
{{DiskStoreCommandsUtils.validatedDirectoriesAndFile}}. I've created a new 
[{{draft PR}}|https://github.com/apache/geode/pull/5167] with your original 
changes + the test to reproduce the regression, you can check it out and 
troubleshoot the issue locally.

> Threads are not properly closed when offline disk-store commands are invoked
> 
>
> Key: GEODE-8119
> URL: https://issues.apache.org/jira/browse/GEODE-8119
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>
> Threads can be opened when you are online and offline, but close only when 
> you are online. Once some offline command started thread it cannot be closed 
> and after some time if there is a bigger number of this threads it can lead 
> to OOM exception.
> Also the problem is that its validating only disk-dirs but not diskStore 
> name. So thread can be created but there is no diskStore with that name and 
> it will also hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked

2020-05-25 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reopened GEODE-8119:
---

> Threads are not properly closed when offline disk-store commands are invoked
> 
>
> Key: GEODE-8119
> URL: https://issues.apache.org/jira/browse/GEODE-8119
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
> Fix For: 1.14.0
>
>
> Threads can be opened when you are online and offline, but close only when 
> you are online. Once some offline command started thread it cannot be closed 
> and after some time if there is a bigger number of this threads it can lead 
> to OOM exception.
> Also the problem is that its validating only disk-dirs but not diskStore 
> name. So thread can be created but there is no diskStore with that name and 
> it will also hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked

2020-05-25 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8119:
--
Fix Version/s: (was: 1.14.0)

> Threads are not properly closed when offline disk-store commands are invoked
> 
>
> Key: GEODE-8119
> URL: https://issues.apache.org/jira/browse/GEODE-8119
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>
> Threads can be opened when you are online and offline, but close only when 
> you are online. Once some offline command started thread it cannot be closed 
> and after some time if there is a bigger number of this threads it can lead 
> to OOM exception.
> Also the problem is that its validating only disk-dirs but not diskStore 
> name. So thread can be created but there is no diskStore with that name and 
> it will also hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3916) CreateDefinedIndexesCommand ignores failed members if at least one member succeeds creating the index

2020-05-25 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115837#comment-17115837
 ] 

Juan Ramos commented on GEODE-3916:
---

Hello [~alberto.bustamante.reyes],
Sure, apparently is fixed already, so feel free to close the ticket. Thanks for 
heads up!.

> CreateDefinedIndexesCommand ignores failed members if at least one member 
> succeeds creating the index
> -
>
> Key: GEODE-3916
> URL: https://issues.apache.org/jira/browse/GEODE-3916
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Juan Ramos
>Assignee: Alberto Bustamante Reyes
>Priority: Major
>
> I've found this issue while working on GEODE-3898.
> With the current logic some indexes might be correctly created in some 
> members and fail in others; but the user is never notified about these 
> failures if there's at least one successful creation.
> The problem resides within the following logic:
> {code:title=CreateDefinedIndexesCommand.java|borderStyle=solid}
>   if (!successfulMembers.isEmpty()) {
>   final InfoResultData infoResult = 
> ResultBuilder.createInfoResultData();
>   (...)
> result = ResultBuilder.buildResult(infoResult);
>   } else {
>   // Group members by the exception thrown.
>   final ErrorResultData erd = 
> ResultBuilder.createErrorResultData();
>   (...)
>   result = ResultBuilder.buildResult(erd);
>   }
> {code}
> *How to Reproduce*
> # Start a locator with {{enable-cluster-configuration-enabled=true}}.
> # Start two servers with {{enable-cluster-configuration-enabled=true}}.
> # Create a sample region: {{gfsh create region --name=TestRegion 
> --type=REPLICATE}}.
> # Create one index: {{create index --name=index1 --expression=value 
> --region=TestRegion1 --member=host1-server1}}
> # Define two indexes: {{gfsh -e "define index --name=index1 
> --expression=value1 --region=TestRegion1" -e "define index --name=index2 
> --expression=value2 --region=TestRegion1"}}.
> # Created the defined indexes: {{gfsh create defined indexes}}.
> The last command will show that the index was successfully created only on 
> the second server and won't say anything about what happened on the first 
> one; which can be troublesome for users that want to automate this kind of 
> process:
> {code}
> Indexes successfully created. Use list indexes to get details.
> 1. 192.168.1.4(host1-server2:11001):1025
> {code}
> Moreover, the {{list indexes}} command will show that the same index has a 
> different definition on both servers:
> {code}
> (3) Executing - list indexes
>  Member Name  | Member ID | Region Path  |  
> Name  | Type  | Indexed Expression | From Clause  | Valid Index
> - | - |  | 
> -- | - | -- |  | ---
> host1-server1 | 192.168.1.4(host1-server1:11002):1026 | /TestRegion1 | 
> index1 | RANGE | value  | /TestRegion1 | true
> host1-server1 | 192.168.1.4(host1-server1:11002):1026 | /TestRegion1 | 
> index2 | RANGE | value2 | /TestRegion1 | true
> host1-server2 | 192.168.1.4(host1-server2:11001):1025 | /TestRegion1 | 
> index1 | RANGE | value1 | /TestRegion1 | true
> host1-server2 | 192.168.1.4(host1-server2:11001):1025 | /TestRegion1 | 
> index2 | RANGE | value2 | /TestRegion1 | true
> {code}
> The command should be able to split the results and report back to the user 
> which indexes succeeded and which failed, specifying on which members as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8150:
--
Fix Version/s: 1.13.0

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Affects Versions: 1.13.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.13.0, 1.14.0
>
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Stati

[jira] [Resolved] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8150.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Affects Versions: 1.13.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.14.0
>
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio thres

[jira] [Updated] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8150:
--
Affects Version/s: 1.13.0

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Affects Versions: 1.13.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zer

[jira] [Comment Edited] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113099#comment-17113099
 ] 

Juan Ramos edited comment on GEODE-8150 at 5/21/20, 2:40 PM:
-

After downgrading the library to {{4.8.61}}, the {{AcceptanceTestOpenJDK11}} is 
failing with {{java.lang.OutOfMemoryError}}. Looking at the heap dump taken 
during the failure, it looks like the biggest objects within the heap all 
belong to the {{classgraph}} library again, so I'll go ahead and revert the 
version straight to {{4.8.52}}, which is the version we were using without  any 
issues prior to applying commit {{e9993c15d88a5edd2a486fd64339deba37c24945}}.


was (Author: jujoramos):
After downgrading the library to {{4.8.61}}, the {{AcceptanceTestOpenJDK11}} is 
failing with {{java.lang.OutOfMemoryError}}. Looking at the heap dump taken 
during the failure, it looks like the biggest objects within the heap all 
belong to the {{classgraph}} library again, so I'll go ahead and revert the 
version straight to {{4.8.52}}, exactly the version we were using without issue 
prior to applying commit {{e9993c15d88a5edd2a486fd64339deba37c24945}}.

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> ==

[jira] [Commented] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113099#comment-17113099
 ] 

Juan Ramos commented on GEODE-8150:
---

After downgrading the library to {{4.8.61}}, the {{AcceptanceTestOpenJDK11}} is 
failing with {{java.lang.OutOfMemoryError}}. Looking at the heap dump taken 
during the failure, it looks like the biggest objects within the heap all 
belong to the {{classgraph}} library again, so I'll go ahead and revert the 
version straight to {{4.8.52}}, exactly the version we were using without issue 
prior to applying commit {{e9993c15d88a5edd2a486fd64339deba37c24945}}.

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  

[jira] [Updated] (GEODE-8150) Downgrade ClassGraph to 4.8.52

2020-05-21 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8150:
--
Summary: Downgrade ClassGraph to 4.8.52  (was: Downgrade ClassGraph to 
4.8.61)

> Downgrade ClassGraph to 4.8.52
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value w

[jira] [Created] (GEODE-8150) Downgrade ClassGraph to 4.8.61

2020-05-20 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8150:
-

 Summary: Downgrade ClassGraph to 4.8.61
 Key: GEODE-8150
 URL: https://issues.apache.org/jira/browse/GEODE-8150
 Project: Geode
  Issue Type: Bug
  Components: management
Reporter: Juan Ramos


While running an internal performance testing scenario, we noticed a 
degradation of around 15% average between the time an entry is added to the 
server region and the time a client with registered CQs receives the 
{{onEvent}} listener callback.
 The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 4 
data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients with 
{{CQs}} registered on the servers. The feeders continuously insert/update 
custom objects into the region (the entries have a {{timestamp}}) and the 
clients measure the latency between the original {{timestamp}} and the one at 
which they receive the event through the {{CqListener.onEvent}} callback.
 After troubleshooting the issue we were able to pinpoint a specific commit on 
which we start seeing the increase in latency:
{noformat}
commit e9993c15d88a5edd2a486fd64339deba37c24945
Author: Anthony Baker 
Date:   Sat Mar 28 15:35:15 2020 -0700
GEODE-7765: Update dependencies for v1.13
Update many but not all dependencies.
{noformat}
The above commit is just an upgrade of several external dependencies, so we 
went ahead and executed the internal scenario using various combinations and 
reverting several dependencies to the "working" version until we found the one 
that's causing the issue: the upgrade of {{classgraph}} from version {{4.8.52}} 
to {{4.8.68}}.
 We've tried upgrading the dependency to the latest released version {{4.8.78}} 
and also increasing the memory to alleviate the extra garbage generated (this 
worked in the past for another degradation introduced by upgrading the same 
library) without luck, the degradation is still there.
Further troubleshooting demonstrated that the actual latency in our test is 
introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, so 
the purpose of this ticket is to downgrade the library to version {{4.8.61}}.
{noformat}

CLASSGRAPH 4.8.62

TEST STATSPEC  OP#0   #1#2#3#4#5#6
#7#8 
63c681d217  e9993c15d8   e9993c15d8 
+ classgraph-4.8.62
**   
#
scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
1.01   --- 
 putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
--- -1.01 
 updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
---   --- 
 updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
-1.13 -1.18 

 --- = Statistic value is less than the ratio threshold
+inf = Statistic value went from zero to non-zero or vice versa and this is good
-inf = Statistic value went from zero to non-zero or vice versa and this is bad




CLASSGRAPH 4.8.61

TEST STATSPEC  OP#0   #1#2#3#4#5#6
#7#8 
63c681d217  e9993c15d8   e9993c15d8 
+ classgraph-4.8.61
**   
#
scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
---  --- 
 putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
-1.01  --- 
 updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
---  --- 
 updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
---  --- 

 --- = Statistic value is less than the ratio threshold
+inf = Statistic value went from zero to non-zero or vice versa and this is good
-inf = Statistic value went from zero to non-zero or vice versa and this is bad

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8150) Downgrade ClassGraph to 4.8.61

2020-05-20 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8150:
--
Labels: caching-applications  (was: )

> Downgrade ClassGraph to 4.8.61
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statist

[jira] [Assigned] (GEODE-8150) Downgrade ClassGraph to 4.8.61

2020-05-20 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-8150:
-

Assignee: Juan Ramos

> Downgrade ClassGraph to 4.8.61
> --
>
> Key: GEODE-8150
> URL: https://issues.apache.org/jira/browse/GEODE-8150
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> While running an internal performance testing scenario, we noticed a 
> degradation of around 15% average between the time an entry is added to the 
> server region and the time a client with registered CQs receives the 
> {{onEvent}} listener callback.
>  The scenario itself uses two empty feeder members ({{DataPolicy.EMPTY}}) and 
> 4 data members ({{DataPolicy.REPLICATE}}), there are also 8 regular clients 
> with {{CQs}} registered on the servers. The feeders continuously 
> insert/update custom objects into the region (the entries have a 
> {{timestamp}}) and the clients measure the latency between the original 
> {{timestamp}} and the one at which they receive the event through the 
> {{CqListener.onEvent}} callback.
>  After troubleshooting the issue we were able to pinpoint a specific commit 
> on which we start seeing the increase in latency:
> {noformat}
> commit e9993c15d88a5edd2a486fd64339deba37c24945
> Author: Anthony Baker 
> Date:   Sat Mar 28 15:35:15 2020 -0700
> GEODE-7765: Update dependencies for v1.13
> Update many but not all dependencies.
> {noformat}
> The above commit is just an upgrade of several external dependencies, so we 
> went ahead and executed the internal scenario using various combinations and 
> reverting several dependencies to the "working" version until we found the 
> one that's causing the issue: the upgrade of {{classgraph}} from version 
> {{4.8.52}} to {{4.8.68}}.
>  We've tried upgrading the dependency to the latest released version 
> {{4.8.78}} and also increasing the memory to alleviate the extra garbage 
> generated (this worked in the past for another degradation introduced by 
> upgrading the same library) without luck, the degradation is still there.
> Further troubleshooting demonstrated that the actual latency in our test is 
> introduced when moving from {{classgraph-4.8.61}} to {{classgraph-4.8.62}}, 
> so the purpose of this ticket is to downgrade the library to version 
> {{4.8.61}}.
> {noformat}
> 
> CLASSGRAPH 4.8.62
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.62
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   ---  1.01  
> 1.01   --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   ---  1.01   
> --- -1.01 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   ---   ---   
> ---   --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.15 
> -1.13 -1.18 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 
> good
> -inf = Statistic value went from zero to non-zero or vice versa and this is 
> bad
> 
> 
> CLASSGRAPH 4.8.61
> 
> TEST STATSPEC  OP#0   #1#2#3#4#5#6
> #7#8 
> 63c681d217  e9993c15d8   
> e9993c15d8 + classgraph-4.8.61
> **   
> #
> scale081 putResponseTime   del  ---  --- -1.02   ---   ---   --- -1.03   
> ---  --- 
>  putsPerSecond avg  ---  --- -1.02   --- -1.01   --- -1.03 
> -1.01  --- 
>  updateEventsPerSecond avg  ---  --- -1.02   ---   ---   --- -1.04   
> ---  --- 
>  updateLatency del  ---  --- -1.01 -1.15 -1.19 -1.18 -1.01   
> ---  --- 
> 
>  --- = Statistic value is less than the ratio threshold
> +inf = Statistic value went from zero to non-zero or vice versa and this is 

[jira] [Reopened] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-05-18 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reopened GEODE-8029:
---

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8079) AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type

2020-05-08 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8079:
--
Fix Version/s: 1.14.0

> AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type
> 
>
> Key: GEODE-8079
> URL: https://issues.apache.org/jira/browse/GEODE-8079
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, gfsh, wan
>Affects Versions: 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.14.0
>
>
> By design, a parallel {{gateway-sender}} can't be attached to a {{REPLICATE}} 
> region.
>  While working on on GEODE-8029 I've found that the above fact is correctly 
> validated when creating or initialising the region, but totally ignored when 
> updating the region through the {{AttributesMutator}} class.
>  Altering a {{REPLICATE}} region to dispatch events through a parallel 
> {{gateway-sender}} results in cryptic errors while putting entries into the 
> region afterwards:
> {noformat}
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10062|2;sequenceID=91;bucketId=98];action=0;operation=CREATE;region=/TestRegion;key=Key90;value=Value90;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=98;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10063|2;sequenceID=92;bucketId=99];action=0;operation=CREATE;region=/TestRegion;key=Key91;value=Value91;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=99;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10064|2;sequenceID=93;bucketId=100];action=0;operation=CREATE;region=/TestRegion;key=Key92;value=Value92;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=100;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10065|2;sequenceID=94;bucketId=101];action=0;operation=CREATE;region=/TestRegion;key=Key93;value=Value93;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649639;shadowKey=-1;timeStamp=1588757649639;acked=false;dispatched=false;bucketId=101;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> {noformat}
> When done from {{GFSH}}, on the other hand, the server doesn't even start up 
> after altering the region as the {{cluster-configuration}} is invalid:
> {noformat}
> gfsh -e "connect" -e "create region --name=TestRegion --type=REPLICATE"
> Member  | Status | Message
> --- | -- | -
> cluster1-server | OK | Region "/TestRegion" created on "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "create gateway-sender --id=MyGateway 
> --remote-distributed-system-id=2 --parallel=true"
> Member  | Status | Message
> --- | -- | 
> --
> cluster1-server | OK | GatewaySender "MyGateway" created on 
> "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "alter region --name=/TestRegion 
> -–gateway-sender-id=MyGateway"
> Membe

[jira] [Resolved] (GEODE-8004) Regression Introduced Through GEODE-7565

2020-05-08 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8004.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> Regression Introduced Through GEODE-7565
> 
>
> Key: GEODE-8004
> URL: https://issues.apache.org/jira/browse/GEODE-8004
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons
> Fix For: 1.14.0
>
>
> Intermittent errors were observed while executing some internal tests and 
> commit 
> [dd23ee8|https://github.com/apache/geode/commit/dd23ee8200cba67cea82e57e2e4ccedcdf9e8266]
>  was determined to be responsible. As of yet, no local reproduction of the 
> issue is available, but work is ongoing to provide a test that can be used to 
> debug the issue (a [PR|https://github.com/apache/geode/pull/4974] to revert 
> of the original commit has been opened and will be merged shortly, though, 
> this ticket is to investigate the root cause so the original commit can be 
> merged again into {{develop}}).
> ---
> It seems that a server is trying to read an {{ack}} response and, instead, it 
> receives a {{PING}} message:
> {noformat}
> [error 2020/04/18 23:44:22.758 PDT  tid=0x165] 
> Unexpected error in pool task 
> 
> org.apache.geode.InternalGemFireError: Unexpected message type PING
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processAck(AbstractOp.java:264)
>   at 
> org.apache.geode.cache.client.internal.PingOp$PingOpImpl.processResponse(PingOp.java:82)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:222)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:207)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:382)
>   at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:268)
>   at 
> org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:352)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:753)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:332)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839)
>   at org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:38)
>   at 
> org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Around the same time, another member of the distributed system logs the 
> following warning, which seems to be related to the original changes as well:
> {noformat}
> [warn 2020/04/18 23:44:22.757 PDT  
> tid=0x298] Unable to ping non-member 
> rs-FullRegression19040559a2i32xlarge-hydra-client-63(bridgegemfire1_host1_4749:4749):41003
>  for client 
> identity(rs-FullRegression19040559a2i32xlarge-hydra-client-63(edgegemfire3_host1_1071:1071:loner):50046:5a182991:edgegemfire3_host1_1071,connection=2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-05-08 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102551#comment-17102551
 ] 

Juan Ramos commented on GEODE-8029:
---

The fix introduced a regression, shown by our internal testing framework, for 
which several tests fail with the following stack trace:

{noformat}
Caused by: org.apache.geode.cache.DiskAccessException: For DiskStore: X: 
Failed to read file during recovery from /X.drf, caused by 
java.io.FileNotFoundException: /.drf (No such file or directory)
  at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1561)
  at 
org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:462)
  at 
org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:379)
  at 
org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2074)
  at 
org.apache.geode.internal.cache.DiskStoreImpl.initializeOwner(DiskStoreImpl.java:655)
  at 
org.apache.geode.internal.cache.DiskRegion.initializeOwner(DiskRegion.java:239)
  at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1071)
  at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
  at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:981)
  at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:785)
  at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:460)
  at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:319)
  at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2896)
  at 
org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:450)
  at 
org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:406)
  at 
org.apache.geode.internal.cache.PRHARedundancyProvider$2.run2(PRHARedundancyProvider.java:1640)
  at 
org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:60)
  at 
org.apache.geode.internal.cache.PRHARedundancyProvider$2.run(PRHARedundancyProvider.java:1630)
  at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.FileNotFoundException: /.drf (No such file or directory)
  at java.base/java.io.FileInputStream.open0(Native Method)
  at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
  at java.base/java.io.FileInputStream.(FileInputStream.java:157)
  at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1477)
  ... 18 more
{noformat}

Re-opening the ticket to revert the original commit and keep working on a 
definite fix for the issue.

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.jav

[jira] [Updated] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-05-08 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8029:
--
Fix Version/s: (was: 1.14.0)

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-07 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8071:
--
Fix Version/s: 1.13.0
   1.12.1

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.12.1, 1.13.0, 1.14.0
>
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8071.
---
Resolution: Fixed

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.14.0
>
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos resolved GEODE-8071.
---
Resolution: Fixed

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.14.0
>
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8071:
--
Fix Version/s: 1.14.0

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
> Fix For: 1.14.0
>
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8079) AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8079:
--
Affects Version/s: 1.12.0

> AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type
> 
>
> Key: GEODE-8079
> URL: https://issues.apache.org/jira/browse/GEODE-8079
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, gfsh, wan
>Affects Versions: 1.12.0
>Reporter: Juan Ramos
>Priority: Major
>
> By design, a parallel {{gateway-sender}} can't be attached to a {{REPLICATE}} 
> region.
>  While working on on GEODE-8029 I've found that the above fact is correctly 
> validated when creating or initialising the region, but totally ignored when 
> updating the region through the {{AttributesMutator}} class.
>  Altering a {{REPLICATE}} region to dispatch events through a parallel 
> {{gateway-sender}} results in cryptic errors while putting entries into the 
> region afterwards:
> {noformat}
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10062|2;sequenceID=91;bucketId=98];action=0;operation=CREATE;region=/TestRegion;key=Key90;value=Value90;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=98;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10063|2;sequenceID=92;bucketId=99];action=0;operation=CREATE;region=/TestRegion;key=Key91;value=Value91;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=99;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10064|2;sequenceID=93;bucketId=100];action=0;operation=CREATE;region=/TestRegion;key=Key92;value=Value92;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=100;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10065|2;sequenceID=94;bucketId=101];action=0;operation=CREATE;region=/TestRegion;key=Key93;value=Value93;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649639;shadowKey=-1;timeStamp=1588757649639;acked=false;dispatched=false;bucketId=101;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> {noformat}
> When done from {{GFSH}}, on the other hand, the server doesn't even start up 
> after altering the region as the {{cluster-configuration}} is invalid:
> {noformat}
> gfsh -e "connect" -e "create region --name=TestRegion --type=REPLICATE"
> Member  | Status | Message
> --- | -- | -
> cluster1-server | OK | Region "/TestRegion" created on "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "create gateway-sender --id=MyGateway 
> --remote-distributed-system-id=2 --parallel=true"
> Member  | Status | Message
> --- | -- | 
> --
> cluster1-server | OK | GatewaySender "MyGateway" created on 
> "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "alter region --name=/TestRegion 
> -–gateway-sender-id=MyGateway"
> Member  | Status | Message
> --- | -- | -
> cluster1-server | OK   

[jira] [Created] (GEODE-8079) AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type

2020-05-06 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8079:
-

 Summary: AttributesMutator Should Validate 
AsyncEventQueue/GatewaySender Type
 Key: GEODE-8079
 URL: https://issues.apache.org/jira/browse/GEODE-8079
 Project: Geode
  Issue Type: Bug
  Components: configuration, gfsh, wan
Reporter: Juan Ramos


By design, a parallel {{gateway-sender}} can't be attached to a {{REPLICATE}} 
region.
 While working on on GEODE-8029 I've found that the above fact is correctly 
validated when creating or initialising the region, but totally ignored when 
updating the region through the {{AttributesMutator}} class.
 Altering a {{REPLICATE}} region to dispatch events through a parallel 
{{gateway-sender}} results in cryptic errors while putting entries into the 
region afterwards:
{noformat}
[vm1] [warn 2020/05/06 10:34:09.638 IST  
tid=0x13] GatewaySender: Not queuing the event 
GatewaySenderEventImpl[id=EventID[id=18 
bytes;threadID=0x10062|2;sequenceID=91;bucketId=98];action=0;operation=CREATE;region=/TestRegion;key=Key90;value=Value90;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 
[originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=98;isConcurrencyConflict=false],
 as the region for which this event originated is not yet configured in the 
GatewaySender

[vm1] [warn 2020/05/06 10:34:09.638 IST  
tid=0x13] GatewaySender: Not queuing the event 
GatewaySenderEventImpl[id=EventID[id=18 
bytes;threadID=0x10063|2;sequenceID=92;bucketId=99];action=0;operation=CREATE;region=/TestRegion;key=Key91;value=Value91;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 
[originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=99;isConcurrencyConflict=false],
 as the region for which this event originated is not yet configured in the 
GatewaySender

[vm1] [warn 2020/05/06 10:34:09.639 IST  
tid=0x13] GatewaySender: Not queuing the event 
GatewaySenderEventImpl[id=EventID[id=18 
bytes;threadID=0x10064|2;sequenceID=93;bucketId=100];action=0;operation=CREATE;region=/TestRegion;key=Key92;value=Value92;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 
[originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=100;isConcurrencyConflict=false],
 as the region for which this event originated is not yet configured in the 
GatewaySender

[vm1] [warn 2020/05/06 10:34:09.639 IST  
tid=0x13] GatewaySender: Not queuing the event 
GatewaySenderEventImpl[id=EventID[id=18 
bytes;threadID=0x10065|2;sequenceID=94;bucketId=101];action=0;operation=CREATE;region=/TestRegion;key=Key93;value=Value93;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 
[originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649639;shadowKey=-1;timeStamp=1588757649639;acked=false;dispatched=false;bucketId=101;isConcurrencyConflict=false],
 as the region for which this event originated is not yet configured in the 
GatewaySender
{noformat}
When done from {{GFSH}}, on the other hand, the server doesn't even start up 
after altering the region as the {{cluster-configuration}} is invalid:
{noformat}
gfsh -e "connect" -e "create region --name=TestRegion --type=REPLICATE"

Member  | Status | Message
--- | -- | -
cluster1-server | OK | Region "/TestRegion" created on "cluster1-server"
Cluster configuration for group 'cluster' is updated.


gfsh -e "connect" -e "create gateway-sender --id=MyGateway 
--remote-distributed-system-id=2 --parallel=true"

Member  | Status | Message
--- | -- | 
--
cluster1-server | OK | GatewaySender "MyGateway" created on 
"cluster1-server"
Cluster configuration for group 'cluster' is updated.


gfsh -e "connect" -e "alter region --name=/TestRegion 
-–gateway-sender-id=MyGateway"

Member  | Status | Message
--- | -- | -
cluster1-server | OK | Region TestRegion altered
Cluster configuration for group 'cluster' is updated.


// Restart Cluster
[warn 2020/05/06 10:09:07.385 IST  tid=0x1] Initialization failed for 
Region /TestRegion
org.apache.geode.internal.cache.wan.GatewaySenderConfigurationException: 
Parallel gateway sender MyGateway can not be used with replicated region 
/TestRegion
at 
org.apache.geode.i

[jira] [Assigned] (GEODE-8079) AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-8079:
-

Assignee: Juan Ramos

> AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type
> 
>
> Key: GEODE-8079
> URL: https://issues.apache.org/jira/browse/GEODE-8079
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, gfsh, wan
>Affects Versions: 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> By design, a parallel {{gateway-sender}} can't be attached to a {{REPLICATE}} 
> region.
>  While working on on GEODE-8029 I've found that the above fact is correctly 
> validated when creating or initialising the region, but totally ignored when 
> updating the region through the {{AttributesMutator}} class.
>  Altering a {{REPLICATE}} region to dispatch events through a parallel 
> {{gateway-sender}} results in cryptic errors while putting entries into the 
> region afterwards:
> {noformat}
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10062|2;sequenceID=91;bucketId=98];action=0;operation=CREATE;region=/TestRegion;key=Key90;value=Value90;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=98;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10063|2;sequenceID=92;bucketId=99];action=0;operation=CREATE;region=/TestRegion;key=Key91;value=Value91;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=99;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10064|2;sequenceID=93;bucketId=100];action=0;operation=CREATE;region=/TestRegion;key=Key92;value=Value92;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=100;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10065|2;sequenceID=94;bucketId=101];action=0;operation=CREATE;region=/TestRegion;key=Key93;value=Value93;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649639;shadowKey=-1;timeStamp=1588757649639;acked=false;dispatched=false;bucketId=101;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> {noformat}
> When done from {{GFSH}}, on the other hand, the server doesn't even start up 
> after altering the region as the {{cluster-configuration}} is invalid:
> {noformat}
> gfsh -e "connect" -e "create region --name=TestRegion --type=REPLICATE"
> Member  | Status | Message
> --- | -- | -
> cluster1-server | OK | Region "/TestRegion" created on "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "create gateway-sender --id=MyGateway 
> --remote-distributed-system-id=2 --parallel=true"
> Member  | Status | Message
> --- | -- | 
> --
> cluster1-server | OK | GatewaySender "MyGateway" created on 
> "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "alter region --name=/TestRegion 
> -–gateway-sender-id=MyGateway"
> Member  | Status | Message

[jira] [Updated] (GEODE-8079) AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8079:
--
Labels: caching-applications  (was: )

> AttributesMutator Should Validate AsyncEventQueue/GatewaySender Type
> 
>
> Key: GEODE-8079
> URL: https://issues.apache.org/jira/browse/GEODE-8079
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, gfsh, wan
>Affects Versions: 1.12.0
>Reporter: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> By design, a parallel {{gateway-sender}} can't be attached to a {{REPLICATE}} 
> region.
>  While working on on GEODE-8029 I've found that the above fact is correctly 
> validated when creating or initialising the region, but totally ignored when 
> updating the region through the {{AttributesMutator}} class.
>  Altering a {{REPLICATE}} region to dispatch events through a parallel 
> {{gateway-sender}} results in cryptic errors while putting entries into the 
> region afterwards:
> {noformat}
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10062|2;sequenceID=91;bucketId=98];action=0;operation=CREATE;region=/TestRegion;key=Key90;value=Value90;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=98;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.638 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10063|2;sequenceID=92;bucketId=99];action=0;operation=CREATE;region=/TestRegion;key=Key91;value=Value91;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=99;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10064|2;sequenceID=93;bucketId=100];action=0;operation=CREATE;region=/TestRegion;key=Key92;value=Value92;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649638;shadowKey=-1;timeStamp=1588757649638;acked=false;dispatched=false;bucketId=100;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> [vm1] [warn 2020/05/06 10:34:09.639 IST  
> tid=0x13] GatewaySender: Not queuing the event 
> GatewaySenderEventImpl[id=EventID[id=18 
> bytes;threadID=0x10065|2;sequenceID=94;bucketId=101];action=0;operation=CREATE;region=/TestRegion;key=Key93;value=Value93;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
>  
> [originalCallbackArg=null;originatingSenderId=1;recipientGatewayReceivers={2}];possibleDuplicate=false;creationTime=1588757649639;shadowKey=-1;timeStamp=1588757649639;acked=false;dispatched=false;bucketId=101;isConcurrencyConflict=false],
>  as the region for which this event originated is not yet configured in the 
> GatewaySender
> {noformat}
> When done from {{GFSH}}, on the other hand, the server doesn't even start up 
> after altering the region as the {{cluster-configuration}} is invalid:
> {noformat}
> gfsh -e "connect" -e "create region --name=TestRegion --type=REPLICATE"
> Member  | Status | Message
> --- | -- | -
> cluster1-server | OK | Region "/TestRegion" created on "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "create gateway-sender --id=MyGateway 
> --remote-distributed-system-id=2 --parallel=true"
> Member  | Status | Message
> --- | -- | 
> --
> cluster1-server | OK | GatewaySender "MyGateway" created on 
> "cluster1-server"
> Cluster configuration for group 'cluster' is updated.
> gfsh -e "connect" -e "alter region --name=/TestRegion 
> -–gateway-sender-id=MyGateway"
> Member  | Status | Message
> --- | 

[jira] [Updated] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-06 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8071:
--
Affects Version/s: (was: 1.13.0)
   1.8.0
   1.9.0
   1.9.1
   1.10.0
   1.9.2
   1.11.0
   1.12.0

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-05 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos reassigned GEODE-8071:
-

Assignee: Juan Ramos

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-05 Thread Juan Ramos (Jira)
Juan Ramos created GEODE-8071:
-

 Summary: RebalanceCommand Should Use Daemon Threads
 Key: GEODE-8071
 URL: https://issues.apache.org/jira/browse/GEODE-8071
 Project: Geode
  Issue Type: Bug
  Components: gfsh, management
Reporter: Juan Ramos


The {{RebalanceCommand}} uses a non-daemon thread to execute its internal logic:

{code:title=RebalanceCommand.java|borderStyle=solid}
ExecutorService commandExecutors = 
LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
{code}

The above prevents the {{locator}} from gracefully shutdown afterwards:

{noformat}
"RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
waiting on condition [0x7f9651471000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007308c36e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-05 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8071:
--
Labels: caching-applications  (was: )

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.13.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>  Labels: caching-applications
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8071) RebalanceCommand Should Use Daemon Threads

2020-05-05 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8071:
--
Affects Version/s: 1.13.0

> RebalanceCommand Should Use Daemon Threads
> --
>
> Key: GEODE-8071
> URL: https://issues.apache.org/jira/browse/GEODE-8071
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, management
>Affects Versions: 1.13.0
>Reporter: Juan Ramos
>Assignee: Juan Ramos
>Priority: Major
>
> The {{RebalanceCommand}} uses a non-daemon thread to execute its internal 
> logic:
> {code:title=RebalanceCommand.java|borderStyle=solid}
> ExecutorService commandExecutors = 
> LoggingExecutors.newSingleThreadExecutor("RebalanceCommand", false);
> {code}
> The above prevents the {{locator}} from gracefully shutdown afterwards:
> {noformat}
> "RebalanceCommand1" #971 prio=5 os_prio=0 tid=0x7f9664011000 nid=0x15905 
> waiting on condition [0x7f9651471000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007308c36e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-05-05 Thread Juan Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8029:
--
Fix Version/s: 1.14.0

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Fix For: 1.14.0
>
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-04-29 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095497#comment-17095497
 ] 

Juan Ramos commented on GEODE-8029:
---

Hello [~jagan23527001],

Just wanted to give you a quick update here, I can confirm that this is a bug 
within the product, reproducible only when a {{gateway-sender}} has a 
{{disk-store}} attached and that {{disk-store}} is not attached to any other 
regions (this is the recommended way of configuring things, BTW, so there's 
nothing wrong with what you did).
I'll start working on a fix for the problem but, as a workaround and to avoid 
hitting the {{IllegalArgumentException}}, I'd recommend to execute offline 
compaction on your {{disk-stores}} regularly. How often?, hard to tell as it 
depends on your workload and the amount of compactable records within the 
{{disk-store}}... you can execute the {{validate offline-disk-store}} command 
and verify the output to check the actual amount, if it's higher than a certain 
threshold (let's say 10 records), go ahead and compact it while the servers 
are online.
For more information about this, please have a look at [Running Compaction on 
Disk Store Log 
Files|https://geode.apache.org/docs/guide/112/managing/disk_storage/compacting_disk_stores.html].



> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCach

[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-04-28 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094663#comment-17094663
 ] 

Juan Ramos commented on GEODE-8029:
---

Hello [~jagan23527001]
Great, glad to hear it worked!.
That said, old {{oplog}} files should be automatically deleted by Geode when 
you have {{auto-compaction}} enabled (which you do according to the logs), I'm 
still not sure how the server got into that situation, it's certainly the first 
time I see this issue. I'll continue investigating and will update the JIRA 
once I have something more concise to share.
Please feel free to reach out if you have further problems.
Cheers.

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLaun

[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-04-28 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094628#comment-17094628
 ] 

Juan Ramos commented on GEODE-8029:
---

Hello [~jagan23527001],

The {{Geode SHell}} tool is just a regular java process so you can directly 
execute {{export JAVA_ARGS="-XmxZg"}} (being Z the max heap you want to use) 
before starting the tool. I'm not entirely sure {{compaction}} will succeed in 
this case, though, chances are high you will hit the same 
{{IllegalArgumentException}}, as Geode needs to load the oplogs into the 
internal hash table before trying to compact old entries... worth giving it a 
try, anyway, if it doesn't work you can go ahead and try the other workaround 
mentioned (delete the files and let the member get the date from the other 
running servers).

> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.

[jira] [Commented] (GEODE-8029) java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

2020-04-28 Thread Juan Ramos (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094606#comment-17094606
 ] 

Juan Ramos commented on GEODE-8029:
---

Thanks [~jagan23527001],

Looks like the validation fails with the same exception, which is expected... 
my current theory is that you have a *huge* amount of deleted records within 
the {{disk-store}} that should have been compacted but, instead, are still 
there, preventing the member from starting up. If my theory is correct, you 
should be able to execute an {{offline compaction}} instead of the steps I've 
previously shared, the steps are below:

# For member {{provServerHO2}}, copy all files under 
{{/app/provServerHO2/data/}} to another directory, just as a backup.
# For member {{provServerHO2}}, execute {{compact offline-disk-store 
--name=geodeStore --disk-dirs=/app/provServerHO2/data}}.
# Try to start member {{provServerHO2}} again, it should come up just fine.
# At this point the cluster should be fully operational, so you can go ahead 
and execute your internal verifications to double check everything is correct.

If the above steps don't work (they should, I'm just adding another option as a 
workaround), you can go ahead and execute the steps I've shared previously, 
that will guarantee the member starts fresh and gets the data from the already 
running servers. The steps are below again, just for your reference:

# Make sure {{provServerHO1}} and {{provServerHO3}} are fully up and running, 
without any exceptions in the logs. If you notice any exceptions or weirdness 
within these members logs, don't continue with the rest of the steps.
# For member {{provServerHO2}}, copy all files under 
{{/app/provServerHO2/data/}} to another directory, just as a backup.
# For member {{provServerHO2}}, remove all files under 
{{/app/provServerHO2/data/}}.
# Try to start member {{provServerHO2}} again, during the startup procedure the 
member should be able to get the latest data from the other running members 
({{provServerHO1}} and {{provServerHO3}}).
# If the above steps finished correctly, execute the [{{gfsh 
rebalance}}|https://geode.apache.org/docs/guide/112/tools_modules/gfsh/command-pages/rebalance.html]
 command to make sure buckets are evenly distributed across the three members 
(this is an expensive operation, so you might want to go through [Rebalancing 
Partitioned Region 
Data|https://geode.apache.org/docs/guide/112/developing/partitioned_regions/rebalancing_pr_data.html]
 to fully understand the implications and requirements).
# At this point the cluster should be fully operational, so you can go ahead 
and execute your internal verifications to double check everything is correct.

Please let me know how it goes.


> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -
>
> Key: GEODE-8029
> URL: https://issues.apache.org/jira/browse/GEODE-8029
> Project: Geode
>  Issue Type: Bug
>  Components: configuration, core, gfsh
>Affects Versions: 1.9.0
>Reporter: Jagadeesh sivasankaran
>Assignee: Juan Ramos
>Priority: Major
>  Labels: GeodeCommons, caching-applications
> Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.in

  1   2   3   4   >