Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-14 Thread Udo Kohlmeyer
Hi there Donal,

Good question. No, the ClassLoaderService does not need to be persisted. The 
type of Service (Modular or Default) is determined at start up and passed in. 
This way we can run the Geode code ClassLoaderIsolated or “normal” depending on 
what ClassLoaderService we decide to pass in.

—Udo
On Sep 15, 2020, 3:42 AM +1000, Donal Evans , wrote:
Sounds good to me. One question though: is it likely that the 
ClassLoaderService configuration will need to be persisted at all? For example, 
would it be reasonable to provide a user with the ability to specify a new 
ClassLoaderService implementation to be used upon cluster restart (via GFSH or 
REST), which would require some sort of persisted configuration? If this is 
likely, will the currently proposed implementation provide this functionality, 
or will that be left for future work? I only ask because I'm not sure how easy 
it is to convert something into a persistent service vs creating it as one in 
the first place. If it's trivial, then no worries.

From: Udo Kohlmeyer 
Sent: Monday, September 14, 2020 3:42 AM
To: geode 
Subject: [Discussion] - ClassLoaderService RFC proposal

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduction%2Bof%2BClassLoaderService%2Binto%2BGeodedata=02%7C01%7Cudo%40vmware.com%7C7fa63fb521b7420dda7608d858d58b5d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637357021472112142sdata=Hr3c2OPVJypxFiL%2BIlX6tpjdQYo%2BPHIrG68JjZNfHqc%3Dreserved=0

All comments are please to be made in this mail thread.

—Udo


Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-14 Thread Donal Evans
Sounds good to me. One question though: is it likely that the 
ClassLoaderService configuration will need to be persisted at all? For example, 
would it be reasonable to provide a user with the ability to specify a new 
ClassLoaderService implementation to be used upon cluster restart (via GFSH or 
REST), which would require some sort of persisted configuration? If this is 
likely, will the currently proposed implementation provide this functionality, 
or will that be left for future work? I only ask because I'm not sure how easy 
it is to convert something into a persistent service vs creating it as one in 
the first place. If it's trivial, then no worries.

From: Udo Kohlmeyer 
Sent: Monday, September 14, 2020 3:42 AM
To: geode 
Subject: [Discussion] - ClassLoaderService RFC proposal

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduction%2Bof%2BClassLoaderService%2Binto%2BGeodedata=02%7C01%7Cdoevans%40vmware.com%7C7cf5b252f85a4deb173f08d8589ae792%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637356769617910090sdata=lDloqR55FH9dkUD9cyboujDkf8GjUdrAqdCzSpB8QCA%3Dreserved=0

All comments are please to be made in this mail thread.

—Udo


Re: Proper location for debugging tools

2020-09-14 Thread Blake Bender
Right - I'm aware of that directory in Geode, just not sure this belongs 
there... yet(?).  I'll submit a PR for the native repo, and if I or someone 
else eventually adapts it for the Java client (low probability IMO), it can be 
moved easily.

Thanks,

Blake

On 9/11/20, 2:11 PM, "Dan Smith"  wrote:

The main geode repo has a dev-tools directory, that’s a good spot for 
scripts. If it’s specific to native client I’d put it in geode-native but if 
not the geode repo seems fine.

-Dan

> On Sep 11, 2020, at 1:41 PM, Blake Bender  wrote:
> 
> Hi all,
> 
> I have a Python script I’ve used quite a bit for diagnosing/debugging 
issues in geode-native that can decode a whole lot of protocol information from 
a debug-level log file.  I think it comes in pretty handy, and would like to 
share.  Just a quick question, though – where’s the right place to put it?  
It’s currently in a private repo, so I could create a new OSS repo for it, or I 
could do something like create a /tools folder in the native tree and submit it 
there in a PR.  There’s also a sort-of shared tools area in the Geode repo 
proper, which isn’t as odd a choice as it first seems, because the script could 
be enhanced pretty readily to read Java client logs and decode the same info.  
Anyone have any strongly-held opinions?
> 
> Thanks,
> 
> Blake
> 




[Discussion] - ClassLoaderService RFC proposal

2020-09-14 Thread Udo Kohlmeyer
Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://cwiki.apache.org/confluence/display/GEODE/Introduction+of+ClassLoaderService+into+Geode

All comments are please to be made in this mail thread.

—Udo


ClassLoaderService RFC Proposal

2020-09-14 Thread Udo Kohlmeyer
Hi there Apache Geode Devs,

Please find


Odg: Colocated regions missing some buckets after restart

2020-09-14 Thread Mario Kevo
Hi,


This problem is usually seen only on 1 server. The other servers metrics and 
bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic server if we have a client 
that tries to reconnect after the server restart. Clients simply get no 
response from the server so they try to close the connection, but the 
connection close is not acknowledged by the server. On server side we see that 
the connections are in CLOSE-WAIT state with packets in the socket receiver 
queue. It’s as if the servers just stopped processing packets on the sockets 
while waiting for a member with the primary bucket.



So in short, each new client connection is “unresponsive”. The client tries to 
close it a open a new one, but the socket doesn’t get closed on server side and 
the connection is left “hanging” on the server. Clients will try to do this 
until max-connections is reached on the servers. This is why we would be unable 
to add any data to the regions. But IMHO it’s really not dependent on adding 
data, since this issue happens occasionally (1 out of ~4 restarts) and only on 
one server.



The initial problem was observed with a persistent region A (with 1 
key-value pairs inserted) and a non-persistent region B collocated with region 
A. We did some tests with both regions being persistent. We haven’t observed 
the same issue yet (although we did only a few restarts), but we observed 
something that also looks quite worrying. Both servers start up without 
reporting issues in the logs. But, looking at the server metrics, one server 
has wrong information about “bucketCount” and is missing primary buckets. E.g:


First server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 113

| primaryBucketCount   | 57



Second server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 111

| primaryBucketCount   | 55


So we are missing a primary bucket without being aware of the issue.

BR,
Mario


Šalje: Anilkumar Gingade 
Poslano: 11. rujna 2020. 20:34
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Are you seeing no-buckets for persistent regions or non-persistent. The buckets 
are created dynamically; when data is added to corresponding buckets...
When server is restarted, in case of in-memory regions as the data is not 
there, the bucket region may not have been created (my suspicion).
Can you try adding data and see if the co-located bucket region gets created in 
respective nodes/server.

-Anil.


On 9/11/20, 9:46 AM, "Mario Kevo"  wrote:

Hi geode-dev,

We have a system with two servers and a few regions. One region is 
persistent and other are not but they are colocated with this persistent region.
After servers restart on some region we can see that they don't have any 
bucket.
gfsh>show metrics --member=server-1 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-1


Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 0
  | primaryBucketCount   | 0
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0

gfsh>show metrics --member=server-0 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-0

Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 113
  | primaryBucketCount   | 56
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0


The persistent region is ok, but some of these colocated regions has this 
issue. We also wait some time, but it doesn't change.

Does anyone have some idea about this problem, what causing the issue?
The issue can be easily reproduced with two locators, two servers, one 
persistent region and few