from:"Darrel Schneider"

Re: [DISCUSS] Alignment of values disabling idleTimeout/loadConditioningInterval between Geode client APIs

2022-06-13 Thread Darrel Schneider

My concern is you are proposing to change the behavior of an existing geode 
feature. I think 0 is currently supported for both these properties in the Java 
client. I would think they cause immediate idle expiration and a very hot load 
conditioning. Your proposal would make 0 mean something very different (never 
idle expire and never load condition).
Also since both of these properties express a time duration, interpreting the 
value 0 as a very short duration is the natural meaning. Treating as an 
infinite duration takes some explanation.
If the native client currently does not support setting these properties to -1 
then you could more safely change it to treat -1 as an infinite duration like 
the Java client does.
So at least both clients would agree on what -1 means.
But they would still disagree on what 0 means. To bring them into agreement you 
need to make a breaking change in either the native client or the java client. 
It might be best just to document this difference. I think it might be a small 
subset of our user base that thinks everything on the native client will be 
consistent with the Java client. Or vice versa. I think most users of one 
client do not even use the other client.


From: Alberto Gomez 
Sent: Monday, June 13, 2022 8:20 AM
To: dev@geode.apache.org 
Subject: [DISCUSS] Alignment of values disabling 
idleTimeout/loadConditioningInterval between Geode client APIs

⚠ External Email

Hi,

According to the documentation of the Geode Java client API, setting -1 for 
idleTimeout in a Pool indicates that connections should never expire:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fclient%2FPoolFactory.html%23setIdleTimeout-long-data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=9Lw0czqIngj47SoJM5hJZJxojAcJjqh3OPYescN4Ctg%3Dreserved=0

Nevertheless, according to the documentation of the Geode Native client API, 
setting a duration of std::chrono::milliseconds::zero() for idleTimeout 
indicates that connections should never expire.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fcppdocs%2Fa00799.html%23ae5fef0b20d95f11399a1fa66f90fbc74data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=pOzqVhNF1WRj3XYqvM%2BUCcwSTLRc94xaCclBvr5X1t8%3Dreserved=0

A similar discrepancy between the two clients can be observed for the 
loadConditioningInterval setting:

According to the documentation of the Java client API, A value of -1 disables 
load conditioning:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fclient%2FPoolFactory.html%23setLoadConditioningInterval-int-data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=bbn6PVstkE9P6ZOcVUZW4%2FJTXrDEIY1UMNJEzq9Oq40%3Dreserved=0

Nevertheless, according to the documentation of the Geode Native client API, 
setting a value of std::chrono::milliseconds::zero() disables load conditioning.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fcppdocs%2Fa00799.html%23aaa812743d8458017bdbb8afa144c05e7data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=zUQDeM%2FkwiTjG9WzHp0t3FPqPt48HuFkYvTpLDnSyWk%3Dreserved=0

This discrepancy can create confusion and lead to a misuse of the client APIs 
which can provoke an unexpected behavior in the client.

Geode API clients should be consistent to avoid these types of problems. 
Therefore, I propose to align both client APIs to use the same values for 
disabling the timing out of connections due to idle-timeout and 
load-conditioning.

Usually, alignment of the Java and native client consists of copying the 
behavior of the Java client into the C++ client.
In this case, nevertheless, I think it makes more sense to take as the model 
the behavior of the native client.
The reason is that I do not think it makes sense to support an idle-timeout and 
a load-conditioning-interval of 0 ms. Consequently, I think a logical value to 
disable both could be 0 ms.

Any thoughts on this proposal?

Thanks,

Alberto

[PROPOSAL] RFC for customization of ResourceManager

2022-03-24 Thread Darrel Schneider

Please give feedback about this proposal to support customizing how the 
ResourceManager interacts with the JVM: 
https://cwiki.apache.org/confluence/display/GEODE/Customization+of+ResourceManager%27s+JVM+Interaction

Re: [PROPOSAL] annul support/1.15

2022-03-16 Thread Darrel Schneider

+1

From: Nabarun Nag 
Sent: Wednesday, March 16, 2022 2:36 PM
To: geode 
Subject: Re: [PROPOSAL] annul support/1.15

+1

From: Owen Nichols 
Sent: Wednesday, March 16, 2022 2:12 PM
To: geode 
Subject: [PROPOSAL] annul support/1.15

Seven weeks after cutting support/1.15, Jira now shows 11 blockers, up from 5 a 
few weeks ago.  I wonder if perhaps we cut the release branch prematurely?  I 
propose that we abandon this branch and focus on getting develop closer to what 
we want to ship, then discuss re-cutting the branch.

If this proposal is approved, I will archive support/1.15 as support/1.15.old, 
revert develop's numbering to 1.15.0, and bulk-update all Jira tickets fixed in 
1.16.0 to fixed in 1.15.0 instead.  Build numbering would start from 
1.15.0-build.1000 to easily distinguish pre- and post- recut.

Please vote/discuss with a goal of reaching consensus by 3PM PDT Monday Mar 21.

Thanks,
-Geode 1.15.0 Release Manager

Re: Proposal: Cut 1.15 release branch from SHA 8f7193c827ee3198ae374101221c02039c70a561

2022-01-26 Thread Darrel Schneider

I think we will probably always have this struggle around the time we cut a new 
release branch and then need to decide what changes on develop to also bring to 
the release branch. I'm okay with us not including the big cleanup in 1.15 
since it is not something we plan on backporting to other support branches and 
if we had cut the 1.15 release branch right before this change, I don't think 
we would have approved it to be back ported to the 1.15 release branch.

I share Owen's concern that not including it might case extra work in resolving 
conflicts in every change to develop after it that we also want to include in 
1.15. We will not know if this is true until we try cherry-picking those 
revisions. So I'm unsure what the best way of not including it is. I'm okay 
with us cutting the branch from the revision before the big commit and then 
working through the list of commits after it and deciding which need to be in 
the 1.15 release.

From: Alexander Murmann 
Sent: Wednesday, January 26, 2022 7:21 AM
To: dev@geode.apache.org 
Subject: Re: Proposal: Cut 1.15 release branch from SHA 
8f7193c827ee3198ae374101221c02039c70a561

Owen, I really appreciate your point about the increased cost of backports by 
the branches diverging like this. I do wonder how high the cost will be in 
practice, given that AFAIK most of these changes limit themselves to a single 
line.

From: Owen Nichols 
Sent: Tuesday, January 25, 2022 20:18
To: dev@geode.apache.org 
Subject: Re: Proposal: Cut 1.15 release branch from SHA 
8f7193c827ee3198ae374101221c02039c70a561

Even a small change can have subtle but important effects only discovered after 
a long time, so leaning on commit-size as a proxy for risk may only serve to 
create a false sense of security.

Also to consider, having a large refactor on develop but not support/1.15 will 
increase backporting pain, as many cherry-picks will have merge conflicts that 
have to be manually "un-refactored".

On 1/25/22, 5:09 PM, "Alexander Murmann"  wrote:

Hi everyone,

Last week we discussed to cut the 1.15 release branch. I would like to 
propose that we cut the branch from last week's SHA 
8f7193c827ee3198ae374101221c02039c70a561. The following commit is a very large 
refactor. Nothing obvious seems wrong with that change, but given that we 
frequently only discover very subtle, but important changes to Geode after a 
long time, I think that this would allow us to reduce some risk for 1.15 and 
its future users and give this large change some time to proof itself on the 
develop branch. I'd love to avoid that work entirely, but am concerned that we 
only might find out about problems a few weeks from now or worse, after we 
shipped.

Another option might be to branch from head and revert the change on the 
release branch. I am uncertain which approach will proof less work.

Re: Open socket handles build up over time (leaking?)

2021-11-22 Thread Darrel Schneider

Thanks for filing the ticket and for looking into the additional leak

From: Leon Finker 
Sent: Monday, November 22, 2021 10:19 AM
To: dev@geode.apache.org 
Subject: Re: Open socket handles build up over time (leaking?)

Hi,

Bug ticket: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9819data=04%7C01%7Cdarrel%40vmware.com%7C8015b9834b2845b645a108d9ade4ab34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637732019908748063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=Ngv07U56igvyPoQHxXEtZXFGxEtUVcyJKde%2F1N2gAlQ%3Dreserved=0

I applied the fix locally and now this leak is fixed. Thank you!

There is still a minor leak of about 10 handles per day that I want to
track down next week. From the looks of it it's not as simple to track
as this one :( Still happens in the durable client connection use case
and cache server port. Seems to be when an existing durable connection
is disconnected and automatically reconnected. I need to do more
testing first to know for sure.

On Wed, Nov 17, 2021 at 4:22 PM Darrel Schneider  wrote:
>
> Yes, open a geode bug ticket
> 
> From: Leon Finker 
> Sent: Wednesday, November 17, 2021 1:13 PM
> To: dev@geode.apache.org 
> Subject: Re: Open socket handles build up over time (leaking?)
>
> Thank you! I'll try to test this change. What's the procedure for
> this? Should I open geode bug ticket?
>
> On Wed, Nov 17, 2021 at 2:22 PM Darrel Schneider  wrote:
> >
> > I think you found the leak!
> > My understanding of the code in registerClientInternal (I'm looking at the 
> > current develop branch) is that when it logs the warning "Duplicate durable 
> > clients are not allowed" that it considers the current client connect 
> > attempt to have failed. It writes this response back to it: 
> > REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This will cause the client to 
> > throw ServerRefusedConnectionException. What seems wrong about this method 
> > is that even though it sets "unsuccessfulMsg" and correctly sends back a 
> > handshake saying the client is rejected, it does not throw an exception and 
> > it does not close "socket". I think right before it calls 
> > performPostAuthorization it should do the followiing:
> > if (unsuccessfulMsg != null) {
> >   try {
> > socket.close();
> >   } catch (IOException ignore) {
> >   }
> >  } else {
> > performPostAuthorization(...)
> > }
> > 
> > From: Leon Finker 
> > Sent: Wednesday, November 17, 2021 10:30 AM
> > To: dev@geode.apache.org 
> > Subject: Re: Open socket handles build up over time (leaking?)
> >
> > Following Darrel's excellent advice :) I think I tracked down the area
> > of the socket handle leak. As the article suggested, I ran the lsof
> > capture every 5 minutes. I then traced back the cleaned up socket
> > handles to the valid lsof entries. I verified a bunch of them and they
> > all ended up pointing to the same durable connection cases that fail
> > as follows:
> >
> > [2021-11-17 14:05:13,081Z][info  > 17> tid=32115] :Cache server: Initializing secondary server-to-client
> > communication socket: Socket[addr=/x.x.x.x,port=56090,localport=40011]
> > [2021-11-17 14:05:13,083Z][warn  > 17> tid=32115] The requested durable client has the same identifier (
> > tpdemo2@x_gem_x ) as an existing durable client (
> > CacheClientProxy[identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> > 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> > timeout=86400]); port=53976; primary=true; version=GEODE 1.14.0] ).
> > Duplicate durable clients are not allowed.
> > [2021-11-17 14:05:13,084Z][warn  > 17> tid=32115] CacheClientNotifier: Unsuccessfully registered client
> > with identifier
> > identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> > 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> > timeout=86400]) and response code 64
> > [2021-11-17 14:05:13,098Z][warn  > 34> tid=290] Server connection from
> > [identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> > 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> > timeout=86400]); port=60242]: connection disconnect detected by EOF.
> >
> > I then counted the number of those errors and found t

Re: Open socket handles build up over time (leaking?)

2021-11-17 Thread Darrel Schneider

Yes, open a geode bug ticket

From: Leon Finker 
Sent: Wednesday, November 17, 2021 1:13 PM
To: dev@geode.apache.org 
Subject: Re: Open socket handles build up over time (leaking?)

Thank you! I'll try to test this change. What's the procedure for
this? Should I open geode bug ticket?

On Wed, Nov 17, 2021 at 2:22 PM Darrel Schneider  wrote:
>
> I think you found the leak!
> My understanding of the code in registerClientInternal (I'm looking at the 
> current develop branch) is that when it logs the warning "Duplicate durable 
> clients are not allowed" that it considers the current client connect attempt 
> to have failed. It writes this response back to it: 
> REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This will cause the client to throw 
> ServerRefusedConnectionException. What seems wrong about this method is that 
> even though it sets "unsuccessfulMsg" and correctly sends back a handshake 
> saying the client is rejected, it does not throw an exception and it does not 
> close "socket". I think right before it calls performPostAuthorization it 
> should do the followiing:
> if (unsuccessfulMsg != null) {
>   try {
> socket.close();
>   } catch (IOException ignore) {
>   }
>  } else {
> performPostAuthorization(...)
> }
> 
> From: Leon Finker 
> Sent: Wednesday, November 17, 2021 10:30 AM
> To: dev@geode.apache.org 
> Subject: Re: Open socket handles build up over time (leaking?)
>
> Following Darrel's excellent advice :) I think I tracked down the area
> of the socket handle leak. As the article suggested, I ran the lsof
> capture every 5 minutes. I then traced back the cleaned up socket
> handles to the valid lsof entries. I verified a bunch of them and they
> all ended up pointing to the same durable connection cases that fail
> as follows:
>
> [2021-11-17 14:05:13,081Z][info  17> tid=32115] :Cache server: Initializing secondary server-to-client
> communication socket: Socket[addr=/x.x.x.x,port=56090,localport=40011]
> [2021-11-17 14:05:13,083Z][warn  17> tid=32115] The requested durable client has the same identifier (
> tpdemo2@x_gem_x ) as an existing durable client (
> CacheClientProxy[identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> timeout=86400]); port=53976; primary=true; version=GEODE 1.14.0] ).
> Duplicate durable clients are not allowed.
> [2021-11-17 14:05:13,084Z][warn  17> tid=32115] CacheClientNotifier: Unsuccessfully registered client
> with identifier
> identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> timeout=86400]) and response code 64
> [2021-11-17 14:05:13,098Z][warn  34> tid=290] Server connection from
> [identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
> 1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
> timeout=86400]); port=60242]: connection disconnect detected by EOF.
>
> I then counted the number of those errors and found that the
> difference is exactly the same as the leaked socket handles. So
> everything points to this area in the server code. But I tried to
> reproduce this in debugger and stepped through the geode code without
> finding where the socket might be leaked. Put breakpoint on
> CacheClientNotifier.registerClientInternal and the line where that
> error happens. But then again the exception seems to bubble up to
> AcceptorImpl.run where the is catch for IOException. And in case of
> IOException, the client socket is closed. Is this the right socket I'm
> after?
>
> Does this make it any clearer for someone who is more familiar with
> the code? NOTE: the duplicate durable client happens sometimes due to
> race conditions with reconnects (I think). It's not happening all the
> time. It's not causing any problems to the client in general. But the
> server leak eventually causes us to run out of file handles for the
> process :(
>
> Thank you!
>
>
> On Tue, Nov 16, 2021 at 4:33 PM Leon Finker  wrote:
> >
> > Hi Darrel,
> >
> > Thank you! I'll try to track it!
> >
> > On Tue, Nov 16, 2021 at 2:42 PM Darrel Schneider  wrote:
> > >
> > > This link: 
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhelp.mulesoft.com%2Fs%2Farticle%2FHow-to-identify-leaked-file-descriptors-that-are-shown-only-as-cant-identify-protocol-in-lsofdata=04%7C01%7Cdarrel%40vmware.com%7C

Re: Open socket handles build up over time (leaking?)

2021-11-17 Thread Darrel Schneider

I think you found the leak!
My understanding of the code in registerClientInternal (I'm looking at the 
current develop branch) is that when it logs the warning "Duplicate durable 
clients are not allowed" that it considers the current client connect attempt 
to have failed. It writes this response back to it: 
REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This will cause the client to throw 
ServerRefusedConnectionException. What seems wrong about this method is that 
even though it sets "unsuccessfulMsg" and correctly sends back a handshake 
saying the client is rejected, it does not throw an exception and it does not 
close "socket". I think right before it calls performPostAuthorization it 
should do the followiing:
if (unsuccessfulMsg != null) {
  try {
socket.close();
  } catch (IOException ignore) {
  }
 } else {
performPostAuthorization(...)
}

From: Leon Finker 
Sent: Wednesday, November 17, 2021 10:30 AM
To: dev@geode.apache.org 
Subject: Re: Open socket handles build up over time (leaking?)

Following Darrel's excellent advice :) I think I tracked down the area
of the socket handle leak. As the article suggested, I ran the lsof
capture every 5 minutes. I then traced back the cleaned up socket
handles to the valid lsof entries. I verified a bunch of them and they
all ended up pointing to the same durable connection cases that fail
as follows:

[2021-11-17 14:05:13,081Z][info  tid=32115] :Cache server: Initializing secondary server-to-client
communication socket: Socket[addr=/x.x.x.x,port=56090,localport=40011]
[2021-11-17 14:05:13,083Z][warn  tid=32115] The requested durable client has the same identifier (
tpdemo2@x_gem_x ) as an existing durable client (
CacheClientProxy[identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
timeout=86400]); port=53976; primary=true; version=GEODE 1.14.0] ).
Duplicate durable clients are not allowed.
[2021-11-17 14:05:13,084Z][warn  tid=32115] CacheClientNotifier: Unsuccessfully registered client
with identifier
identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
timeout=86400]) and response code 64
[2021-11-17 14:05:13,098Z][warn  tid=290] Server connection from
[identity(TS02493(tpdemo2@x:loner):60759:9243242e:tpdemo2@x(version:GEODE
1.14.0),connection=1,durableAttributes=DurableClientAttributes[id=tpdemo2@TS02493@21.5.3@8.2.6@2021.5.8_gem_Amoeba;
timeout=86400]); port=60242]: connection disconnect detected by EOF.

I then counted the number of those errors and found that the
difference is exactly the same as the leaked socket handles. So
everything points to this area in the server code. But I tried to
reproduce this in debugger and stepped through the geode code without
finding where the socket might be leaked. Put breakpoint on
CacheClientNotifier.registerClientInternal and the line where that
error happens. But then again the exception seems to bubble up to
AcceptorImpl.run where the is catch for IOException. And in case of
IOException, the client socket is closed. Is this the right socket I'm
after?

Does this make it any clearer for someone who is more familiar with
the code? NOTE: the duplicate durable client happens sometimes due to
race conditions with reconnects (I think). It's not happening all the
time. It's not causing any problems to the client in general. But the
server leak eventually causes us to run out of file handles for the
process :(

Thank you!

On Tue, Nov 16, 2021 at 4:33 PM Leon Finker  wrote:
>
> Hi Darrel,
>
> Thank you! I'll try to track it!
>
> On Tue, Nov 16, 2021 at 2:42 PM Darrel Schneider  wrote:
> >
> > This link: 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhelp.mulesoft.com%2Fs%2Farticle%2FHow-to-identify-leaked-file-descriptors-that-are-shown-only-as-cant-identify-protocol-in-lsofdata=04%7C01%7Cdarrel%40vmware.com%7Cc51c566dbdb94f54bdf608d9a9f85987%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637727706394598508%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=AOWZs7WTSsQ8Ue8FhaVkzLZcFxmtAc%2BlQa560OuW8Zs%3Dreserved=0
> >  points out that once the fd gets into this "can't identify protocol" state 
> > you can no longer figure out things like what port(s) and addresses are 
> > associated with the fd. They suggest running lsof periodically to catch 
> > that fd (in your example output the first fd is 133u) when it was still 
> > healthy. This would help track it down to an area of the geode code. For 
> > example you could see that it was on of the sockets using the port the 
> > cache server is listening on

Re: Open socket handles build up over time (leaking?)

2021-11-16 Thread Darrel Schneider

This link: 
https://help.mulesoft.com/s/article/How-to-identify-leaked-file-descriptors-that-are-shown-only-as-cant-identify-protocol-in-lsof
 points out that once the fd gets into this "can't identify protocol" state you 
can no longer figure out things like what port(s) and addresses are associated 
with the fd. They suggest running lsof periodically to catch that fd (in your 
example output the first fd is 133u) when it was still healthy. This would help 
track it down to an area of the geode code. For example you could see that it 
was on of the sockets using the port the cache server is listening on.
How to identify leaked file descriptors that are shown only as “can't identify 
protocol” in lsof | MuleSoft Help 
Center
Functional cookies enhance functions, performance, and services on the website. 
Some examples include: cookies used to analyze site traffic, cookies used for 
market research, and cookies used to display advertising that is not directed 
to a particular individual.
help.mulesoft.com


From: Leon Finker 
Sent: Tuesday, November 16, 2021 9:28 AM
To: dev@geode.apache.org 
Subject: Open socket handles build up over time (leaking?)

Hi,

We observe in our geode (1.14 - same before as well in 1.13) cache
server (that supports durable client sessions) an increase in half
opened sockets. It seems there is a socket leak. Could someone
recommend how to track the leak down? It's not obvious where it's
leaking...I can only suspect AcceptorImpl.run and where it only
handles IOException. but I wasn't able to reproduce it in debugger
yet...

lsof -p 344|grep "can't"

java 344   user  133u  sock0,6   0t0 115956017
can't identify protocol
java 344   user  142u  sock0,6   0t0 113361870
can't identify protocol
java 344   user  143u  sock0,6   0t0 111979650
can't identify protocol
java 344   user  156u  sock0,6   0t0 117202529
can't identify protocol
java 344   user  178u  sock0,6   0t0 113357568
can't identify protocol
...

lsof -p 344|grep "can't"|wc -l
934

Thank you

Re: Failed durable client connection initialization can sometimes leak client socket handle?

2021-11-16 Thread Darrel Schneider

The run method on AcceptorImpl is run in a LoggingThread instance (see 
AcceptorImpl.start()). So any exceptions thrown by AcceptorImpl.run() will be 
logged as a fatal log message containing ""Uncaught exception in thread" by the 
LoggingThread. You can see the code that does this in 
LoggingUncaughtExceptionHandler.

Also the AcceptorImpl.run() method I see has a finally block in which it closes 
"serverSocket" if it is not null. It is on this close call that it catches and 
ignores IOException.
I think you may be talking about AcceptorImpl.ClientQueueInitializerTask.run(). 
This run is called from an executor which is created in 
initializeClientQueueInitializerThreadPool. It uses CoreLoggingExecutors so 
once again any unhandled exception should be logged as a fatal log msg in the 
server log.

From: Leon Finker 
Sent: Wednesday, November 10, 2021 10:01 AM
To: dev@geode.apache.org 
Subject: Failed durable client connection initialization can sometimes leak 
client socket handle?

Hi,

In AcceptorImpl.run, the accepted client socket seems to only be
closed when there is IOException. I can't prove it, but I think there
can sometimes be non IO exception here as well and then the client
socket will not be closed? Also, can we please add a catch for other
kinds of exceptions and at least log them as errors?

The symptoms we have are like this:
1. Durable client has a connection problem during initialization.
2. Durable client ends up with orphaned durable HA region (the one
prefixed with_gfe_durable_client_with_id_)
3. Now the client automatically reconnects and the geode server fails
to properly initialize the client. Most likely because the region
already has an error. If inspecting the regions at runtime, we indeed
can see durable region for the client without CacheClientProxy
properly created and added to the proxies collection.
4. We observe a pretty rapid (over few days) memory leak and socket handles leak
5. This leak stops as soon as we destroy that internal durable region
(partially through reflection) for the client and client can then
properly reconnect and initialize its region and proxy.

Does this ring any bells for anyone?

Thank you

Re: @TestOnly or @VisibleForTesting

2021-11-05 Thread Darrel Schneider

+1 to Donal.

And it also provides a way to easily find these places in the future that can 
have some further refactoring. Just search for @VisibleForTesting and refactor 
it away.

From: Donal Evans 
Sent: Friday, November 5, 2021 9:19 AM
To: dev@geode.apache.org 
Subject: Re: @TestOnly or @VisibleForTesting

The main time I use the @VisibleForTesting annotation is when I'm unit testing 
a method that includes static calls that can't be mocked or verified. A random 
example from the codebase is LatestLastAccessTimeMessage.process() which has a 
call to sendReply(dm, lastAccessed) as the last line of the method. This method 
has been extracted and marked @VisibleForTesting to allow verification and 
mocking in unit tests as it just wraps a static call:

  @VisibleForTesting
  void sendReply(ClusterDistributionManager dm, long lastAccessTime) {
ReplyMessage.send(getSender(), this.processorId, lastAccessTime, dm);
  }

I agree that ideally we wouldn't need to use these annotations, but given a 
choice between developers using them to improve unit test coverage on classes 
that are severely lacking in it and not adding test coverage at all because, in 
order to avoid using these annotations, a prohibitively huge refactor of 
multiple classes would be required, I would definitely see using them as the 
lesser of two evils.

From: Alexander Murmann 
Sent: Thursday, November 4, 2021 5:02 PM
To: dev@geode.apache.org 
Subject: Re: @TestOnly or @VisibleForTesting

Another +1 to Eric's point. What are others seeing as valid use cases for those 
annotations?

From: Patrick Johnson 
Sent: Thursday, November 4, 2021 15:55
To: dev@geode.apache.org 
Subject: Re: @TestOnly or @VisibleForTesting

I agree with Eric. Maybe rather than standardizing on one, we should stop 
adding anymore @VisibleForTesting or @TestOnly to the codebase. Possibly 
deprecate @VisibleForTesting.

> On Nov 4, 2021, at 3:30 PM, Eric Zoerner  wrote:
>
> My opinion is that @VisibleForTesting is a code smell and either indicates 
> that there is refactoring needed or there are tests that are unnecessary. If 
> there is functionality in a private method that needs to be tested 
> independently, then that code should be extracted and placed in a separate 
> class that has a wider visibility that can be tested on its own.
>
> The same could probably be said for @TestOnly unless there are actually 
> static analysis tools that need it for some reason.
>
> From: Kirk Lund 
> Date: Thursday, November 4, 2021 at 15:13
> To: dev@geode.apache.org 
> Subject: Re: @TestOnly or @VisibleForTesting
> As everyone thinks about how we want to use these annotations, please keep
> this in mind that both *@VisibleForTesting* and *@TestOnly* can be used on
> Types (Class/Interface/Enum), Constructors, Methods and Fields. (not just
> Methods)
>
> On Thu, Nov 4, 2021 at 3:09 PM Kirk Lund  wrote:
>
>> Hey all,
>>
>> We're introducing a mess to the codebase. It's a small problem, but
>> several small problems become a big problem and one of my missions is to
>> clean up and improve the codebase.
>>
>> I recently started seeing lots of pull requests with usage of @TestOnly.
>> Sometimes it's used instead of @VisibleForTesting, while sometimes I see
>> both annotations added to the same method.
>>
>> Before we start using @TestOnly, I think we need some guidelines for when
>> to use @TestOnly versus @VisibleForTesting. Or if we're going to replace
>> @VisibleForTesting with @TestOnly, then we either need a PR for the
>> replacement or, at a minimum, deprecation annotation and javadocs added to
>> VisibleForTesting.java.
>>
>> The annotations appear similar but the javadocs describe slightly
>> different meanings for them...
>>
>> *@VisibleForTesting* was created in Geode several years ago to mean that
>> the method is either only for testing or the visibility of it was widened
>> (example: a private method might be widened to be package-private,
>> protected or public). The method might be used by product code, but it also
>> has widened scope specifically to allow tests to call it. The javadocs say:
>>
>> "Annotates a program element that exists, or is more widely visible than
>> otherwise necessary, only for use in test code.
>>
>> Introduced while mobbing with Michael Feathers. Name and javadoc borrowed
>> from Guava and AssertJ (both are Apache License 2.0)."
>>
>> *@TestOnly* started appearing when we added org.jetbrains.annotations
>> dependency earlier this year. It seems to indicate a method that is ONLY
>> used for tests (never called by product). The javadocs say:
>>
>> "A member or type annotated with TestOnly claims that it should be used
>> from testing code only.
>>
>> Apart from documentation purposes this annotation is intended to be used
>> by static analysis tools to validate against element contract violations.
>>
>> This annotation means that the annotated

Re: Fw: [DISCUSS] Rebase and Squash Options on Github develop

2021-06-29 Thread Darrel Schneider

+1

From: Jens Deppe 
Sent: Tuesday, June 29, 2021 7:44 AM
To: dev@geode.apache.org 
Subject: Re: Fw: [DISCUSS] Rebase and Squash Options on Github develop

+1 For Naba’s proposal

From: Joris Melchior 
Date: Tuesday, June 29, 2021 at 7:40 AM
To: dev@geode.apache.org 
Subject: Re: Fw: [DISCUSS] Rebase and Squash Options on Github develop
+1 for Naburan’s proposed options.

From: Nabarun Nag 
Date: Monday, June 28, 2021 at 4:06 PM
To: Owen Nichols , dev@geode.apache.org 

Subject: Re: Fw: [DISCUSS] Rebase and Squash Options on Github develop
Just a clarification.

The options I am proposing to be kept in the PRs are:

  *   Squash and merge
  *   Rebase and merge

Regards,
Nabarun

From: Owen Nichols 
Sent: Monday, June 28, 2021 1:03 PM
To: Nabarun Nag ; dev@geode.apache.org 
Subject: Re: Fw: [DISCUSS] Rebase and Squash Options on Github develop

Sounds like a good idea. I can’t find any example of an intentional merge 
commit on develop…but plenty of accidental ones.

Upon releases we do a command line merge commit to the main trunk, which should 
still be fine as this proposal only applies to PRs.

I agree with still allowing rebase commits. I have seen several PRs use them to 
good effect.

You can disable merge commits through .asf.yaml, so changing the setting is as 
easy as opening a PR (and, if we don’t like it, reverting is just as easy)

---
Sent from Workspace ONE 
Boxer

On June 28, 2021 at 12:38:50 PM PDT, Nabarun Nag  wrote:
Hi all,

Last year, we had a discussion on rebase and squash merge PRs to the develop 
branch,

  *   to have linear git history
  *   help with bisecting
  *   easier root cause analysis of failures
  *   easier to backport changes

However, many developers have reached out mentioning that sometimes they have 
clicked the merge button in hurry when they meant to rebase/squash. Once 
clicked there is no going back. All of them suggested that to have the GitHub 
setting on develop branch to rebase and squash merge option and hence there is 
no room for mistakes to occur.

I would like to propose to that we set the GitHub setting on develop PR to 
rebase and squash buttons.

Please do note that this is reversible and can be changed back, hence there is 
no harm in trying it out.

Regards
Nabarun Nag

From: Owen Nichols (Pivotal) 
Sent: Thursday, April 9, 2020 11:45 AM
To: dev@geode.apache.org 
Subject: Re: [REQUEST] Squash merge please

I’ve noticed quite a few PRs in the last week that were merged with “Merge” 
rather than “Squash and Merge”.

While the community consensus was to continue to allow all merge options, 
please try to default to “Squash and Merge” whenever you can to keep history as 
linear as possible. GitHub will save the last method you used in a cookie, 
which helps, but then it’s easy to miss when it resets itself back to the 
default of “Merge” some months later because you cleared cookies, changed 
browsers, etc.

> On Oct 22, 2019, at 5:12 PM, Nabarun Nag  wrote:
>
> Hi Geode Committers,
>
> A kind request for using squash commit instead of using merge.
> This will really help us in our bisect operations when a regression/flakiness 
> in the product is introduced. We can automate and go through fewer commits 
> faster, avoiding commits like "spotless fix" and "re-trigger precheck-in" or 
> other minor commits in the merged branch.
>
> Also, please use the commit format : (helps us to know who worked on it, what 
> is the history)
> GEODE-: 
>
> * explanation line 1
> * explanation line 2
>
> This is not a rule or anything, but a request to help out your fellow 
> committers in quickly detecting a problem.
>
> For inspiration, we can look into Apache Kafka / Spark where they have a 
> complete linear graph for their main branch HEAD [see attachment]
>
> Regards
> Naba.
>
>

Re: Cleaning up the codebase - use curly braces everywhere

2021-05-27 Thread Darrel Schneider

+1 thanks for working on this

From: Donal Evans 
Sent: Thursday, May 27, 2021 10:22 AM
To: dev@geode.apache.org 
Subject: Cleaning up the codebase - use curly braces everywhere

Hi Geode dev,

I've recently been looking at ways to improve code quality in small ways 
throughout the codebase, and as a starting point, I thought it would be good to 
make it so that we're consistently using curly braces for control flow 
statements everywhere, since this is something that's specifically called out 
in the Geode Code Style Guide wiki page[1] as one of the "more important 
points" of our code style.

IntelliJ has a "Run inspection by name..." feature that makes it possible to 
identify all places where curly braces aren't used for control flow statements, 
(which showed over 3300 occurrences in the codebase) and also allows them to be 
automatically inserted, making the fix relatively trivial. Since this PR will 
touch 640 files, I wanted to make sure to first check that this is something 
even worth doing, and, if there's agreement that it is, to give reviewers 
context on what the changes are, the motivation for them, and how they were 
made, to help with the review process.

The draft PR I have up[2] currently has no failing tests and can be marked as 
ready to review if there aren't any objections, and once it is, I'll try to 
coordinate with codeowners to get the minimal number of approvals required for 
a merge (it looks like only 6-7 reviewers are needed, though I'm sure that 
almost every code owner will be tagged as reviewers given the number of files 
touched).

If this idea is a success, I think it would be good to have a discussion about 
other low-hanging code improvements we could make using static analysis 
(unnecessary casts, unused variables, duplicate conditions etc.), and, once a 
particular inspection has been "fixed," possibly consider adding a check for it 
as part of the PR pre-checkin to make sure it's not reintroduced. All thoughts 
and feedback are very welcome.

Donal

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FCode%2BStyle%2BGuidedata=04%7C01%7Cdarrel%40vmware.com%7C55c1f47da50548d3fa5708d92134034c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577329548326057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=BVvzQAdQ24ZuoOdoMrkGNJShmTkep4BGFduSAQu5H6o%3Dreserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6523data=04%7C01%7Cdarrel%40vmware.com%7C55c1f47da50548d3fa5708d92134034c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577329548326057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=gzta%2FwrhapOzp0DKcMEn1IR5XsNboWuMlUJwUwOrcGc%3Dreserved=0

Re: Question about the write-buffer-size parameter when creating a disk store

2021-05-24 Thread Darrel Schneider

I also could not find any code that used the write-buffer-size when allocating 
a buffer.
I did find this method: Oplog.allocateWriteBuf
It seems like this would be the allocation of the disk store write buffer.
If one is not already allocated then this method checks a sysprop and then 
defaults to 32k.

return ByteBuffer.allocateDirect(Integer.getInteger("WRITE_BUF_SIZE", 32768));

It seems like this code should at least also ask the DiskStoreImpl (using 
Oplog.parent) what the write buffer size is if the sys prop is not set.

From: Alberto Gomez 
Sent: Monday, May 24, 2021 8:51 AM
To: dev@geode.apache.org 
Subject: Question about the write-buffer-size parameter when creating a disk 
store

Hi,

According to the Geode documentation, it is possible to set the write buffer 
size by using --write-buffer-size when creating a disk store [1].

Nevertheless, looking at the code, I have not seen that setting a value for 
that parameter has any effect. Does anybody know if I am correct or if I am 
missing something?

Thanks in advance,

Alberto G.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Ftools_modules%2Fgfsh%2Fcommand-pages%2Fcreate.htmldata=04%7C01%7Cdarrel%40vmware.com%7Cb63852a3a5f3443fba8608d91ecbda73%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637574683156060826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Ws2jZCItSzOU31LjWpNVl55m4J74VrkII0apK8AfGQk%3Dreserved=0

Re: Reminder to use draft mode

2021-05-06 Thread Darrel Schneider

+1 to Donal's comments

From: Donal Evans 
Sent: Thursday, May 6, 2021 11:44 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

+1 to Naba's PR flow described above.

Creating PRs in draft mode is almost always the best choice, as it prevents 
people from being tagged to review a set of changes that may change 
significantly due to test failures and only requires a single click to convert 
to the "ready to review" state - hardly a major inconvenience.

However, the real tricky question here seems to be "When should you move a PR 
from "Ready to review" back into draft mode?" I tend to agree with Jens that a 
flaky test failure by itself isn't enough to warrant putting a PR back into 
draft mode, as it's often possible to identify the failure as being due to an 
existing known bug and merge the PR knowing that your changes aren't the cause. 
We don't require that all PR tests are green before merging, just some of them, 
so it's reasonable to assume that we don't require all PR tests to be green 
before a PR is considered ready for review either.

Minor edits due to review comments (like spelling mistakes or minor code 
quality/style changes) also don't feel like they should cause a PR to be put 
back into draft mode, as while the contents of the PR may change because of 
them, it won't invalidate other in-progress reviews if it does, or 
significantly alter the nature of the PR.

For me, the bar for whether a PR should be put back into draft mode is if you 
know that its current state is not reflective of the final state that will be 
merged into develop. In general, the only time that should happen is if you've 
received review feedback that will require a change of approach or significant 
refactoring/additional code. It's the difference between "needs a little 
polish" and "needs more work," I think. Obviously, what counts as "significant" 
is entirely subjective, so this isn't much use as a hard and fast rule, but a 
rough guide might be that if a reviewer has requested changes that would 
invalidate or render obsolete/redundant any additional reviews that come in 
before those changes are applied, moving back to draft mode would probably be a 
good idea.

Donal

From: Nabarun Nag 
Sent: Thursday, May 6, 2021 10:22 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

I feel that Owen has a valid point and I myself feel that it is ok to start the 
PR in draft mode till the pre-check tests pass.

There has been this situation where,

  *   PR is created (reviewers are assigned)
  *   approved
  *   Tests fail
  *   code is changed
  *   no reviews
  *   code is merged

Hence code that is not reviewed has been merged

This way of doing work also has the following advantages:

  *   A reviewer does not have to review a code that causes tests to fail
  *   A reviewer does not have to review code twice before failure and then 
again after changing the code to fix the failure
  *   Unreviewed code post-test fixes do not get merged

I think this way of working saves a critical amount of time for engineers who 
review code.

This flow of PRs feels more efficient:

  *   Create PR in draft mode - no reviewers assigned
  *   PRechecks fail
  *   change/fix code
  *   tests pass - all green
  *   convert PR to ready for review - reviewers assigned
  *   reviewers review

Regards
Naba

From: Owen Nichols 
Sent: Thursday, May 6, 2021 9:59 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

Given the lack of consensus, it sounds like it will not be possible to make any 
assumptions about a PR based on whether it is in Draft mode or not.  I will 
stop retriggering flaky checks or changing PRs to draft status.  My apologies 
for the inconvenience this has caused.

On 5/6/21, 9:47 AM, "Jens Deppe"  wrote:

I don’t think we can presume everyone has the same working style. For 
myself I’ll happily review a PR that has a failing check. I’m OK if it has some 
innocuous ‘housekeeping’ error or unrelated failure.

I don’t retrigger PR failures, for unrelated errors, just to ‘get to green’ 
– related, I don’t expect anyone to do that on my part either. It would be 
frustrating if I was about to merge something and someone retriggers a job. Yes 
I do merge if I’m 100% confident the failed check is unrelated. I don’t merge 
if any checks are still pending.

Perhaps this is just relevant to my current situation, but most of my PRs 
are module specific and so there is collaboration between my team and we 
typically know the state of our various PRs. I don’t feel like there is much 
need for any process around switching in and out of Draft mode. Much less for 
an ‘external’ contributor to make decisions on our behalf.

Has some situation arisen that is driving this? It feels like there is some 
underlying issue that isn’t being fully communicated.

--Jens

Re: Odg: Geode retry/acknowledge improvement

2021-04-30 Thread Darrel Schneider

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony

> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3Dreserved=0
>>
>> My question is, what is your estimation, how much effort/work is needed to 
>> implement message retry/acknowledge logic in geode,
>> to solve this problem?
>>
>> BR,
>> Mario
>

Re: Stats deprecations?

2021-04-14 Thread Darrel Schneider

It might break something but it should not be too bad. Internal product 
features are not supposed to read stat values and make decisions based on them 
because geode allows stat storage to be disabled. Also all these callers of the 
deprecated methods will be internal. So most of the code that calls a gettor 
for a stat value is probably a test. Since we now store that stat as a 64-bit 
value (instead of 32-bit) it would probably be best if we didn't truncate when 
we read it which is what is happening now when getInt is called. It would be 
good to do this in its own PR since it will touch a lot of code.

From: Mark Hanson 
Sent: Wednesday, April 14, 2021 4:13 PM
To: dev@geode.apache.org 
Subject: Re: Stats deprecations?

The one gotcha which I am totally fine with is that the actual calls like 
get will return long rather than int. Is that going to break anything?


On 4/14/21, 4:10 PM, "Dan Smith"  wrote:

Looks like those methods were deprecated in GEODE-6850. If I'm reading that 
correctly, there is no reason not to change the calls to incLong, getLong, etc. 
I'd say go for it.

-Dan

From: Mark Hanson 
Sent: Wednesday, April 14, 2021 3:43 PM
To: dev@geode.apache.org 
Subject: Stats deprecations?

Hi,

So I am making some stats changes that are pretty minor, but I was looking 
at the fact that we have a ton of deprecated stuff about incInt and getInt. Can 
we convert those to longs? This would mean changing storage and return types.

Thanks,
Mark

[Proposal] backport GEODE-8761 to support/1.14

2021-03-30 Thread Darrel Schneider

The previous fix for GEODE-8761 that was backported to support/1.14 had a 
problem.
So I'd like to backport the fix for it that can be found in this pull request: 
https://github.com/apache/geode/pull/6236

Re: [DISCUSS] removal of experimental Protobuf client/server interface

2021-03-23 Thread Darrel Schneider

I'm in favor of its removal. I was working on improving the geode thread 
monitor and found doing that on the protobuf code was much more complicated.

From: Bruce Schuchardt 
Sent: Tuesday, March 23, 2021 8:16 AM
To: dev@geode.apache.org 
Subject: [DISCUSS] removal of experimental Protobuf client/server interface

Hi folks,

We’ve had an experimental client/server interface in Geode that no-one to my 
knowledge is using.  We’re testing it with every build and are having to make 
changes to it to keep it up to date with the rest of the project.  The last 
change of substance to the geode-protobuf sub-project, for instance, was in 
2018 but that’s been followed by many incidental commits.

GEM-8997
 was opened to have the sub-projects for this interface removed.  I’ve prepared 
a pull 
request
 to remove it and would like to get consensus to move forward with that effort.

Re: [Proposal] Backport GEODE-9045 to support/1.14

2021-03-17 Thread Darrel Schneider

+1

From: Hale Bales 
Sent: Wednesday, March 17, 2021 11:42 AM
To: dev@geode.apache.org 
Subject: [Proposal] Backport GEODE-9045 to support/1.14

Hello,

 I am putting forward the proposal to backport GEODE-9045 (Rename Redis 
properties and error messages) to support/1.14 branch,

What does GEODE-9045 do?

  *   It renames the redis-port property to 
compatible-with-redis-port
  *   It renames the redis-bind-address property to 
compatible-with-redis-bind-address
  *   It renames the redis-password property to 
compatible-with-redis-password
  *   It rewords error messages to remove usages of Redis, and 
changing them to something
  along the lines of Compatible with Redis

These changes are low-risk as they are limited entirely to the Geode’s 
Redis-compatibility subsystem and do not impact any other Geode code. This is 
simply a rename of properties and does not change any behavior.

Why do we need to backport these changes?

  *   These changes will make us compliant with Redis's copyright, 
which does not allow us to claim that we are Redis. This prevents us from 
having any customer-facing places that say Redis, or Geode Redis.

~ hale (they/them)

Re: Proposal to backport GEODE-9001 to 1.14

2021-03-15 Thread Darrel Schneider

+1

From: John Hutchison 
Sent: Monday, March 15, 2021 10:48 AM
To: dev@geode.apache.org 
Subject: Proposal to backport GEODE-9001 to 1.14

Hi All-

Prosing to include GEODE-9001 in the 1.14 release.

Rationale for proposal:

This story changes only documentation related files, and only files within the 
Geode-Redis module.
This story will provided updated documentation to reflect the most recent 
changes in the Geode-Redis module that are included in the 1.14 branch.

Thanks a lot,
John

Re: [Proposal] Backport GEODE-9029 - Initial support for Redis SLOWLOG command

2021-03-12 Thread Darrel Schneider

+1

From: Raymond Ingles 
Sent: Friday, March 12, 2021 4:29 PM
To: dev@geode.apache.org 
Subject: [Proposal] Backport GEODE-9029 - Initial support for Redis SLOWLOG 
command

Hello –

 Putting forward the proposal to backport GEODE-9029 (Redis SLOWLOG command 
support) to support/1.14 branch,

What does GEODE-9029 do?

  *   It adds unit/integration/dunit tests for the Redis SLOWLOG command
  *   It moves the SLOWLOG command to the ‘Supported’ category

These changes are low-risk as they are limited entirely to the Geode’s 
Redis-compatibility subsystem and do not impact any other Geode code.

The version of SLOWLOG is essentially a stub, because (a) Geode does not track 
such data the way Redis does, but (b) if the command is not supported, some 
monitoring tools (like certain versions of the DataDog agent) can experience 
internal errors that prevent monitoring of other, fully-supported statistics.

Why do we need to backport these changes?

  *   These changes will allow automated tools that monitor Redis 
health (e.g. DataDog) to also monitor the health of the Geode 
compatibility-with-Redis subsystem.
  *   If we don't backport these changes to 1.14.0 then use of Redis 
compatibility will be impaired or unacceptable for several users

Reference PR: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6131data=04%7C01%7Cdarrel%40vmware.com%7Cabc49d94111a4905f31308d8e5cbf118%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637512011374535216%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=mVb69kveoXsK8Cz9rewmJX%2BtVHirekGc0C1nih3S6F0%3Dreserved=0

Thanks,
Ray Ingles

Re: [PROPOSAL]: Backport redis-compatibility changes to 1.14 - GEODE-9021 and GEODE-9023

2021-03-12 Thread Darrel Schneider

+1

From: Jens Deppe 
Sent: Friday, March 12, 2021 2:13 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL]: Backport redis-compatibility changes to 1.14 - GEODE-9021 
and GEODE-9023

These changes relate to Redis-compatibility which will be enabled and 
non-experimental in 1.14

GEODE-9021: Avoid unnecessary buffer creation in redis Coder

  *   Provides performance improvements in terms of CPU and memory usage

GEODE-9023: Add sharding support to redis region

  *   This change introduces a partition resolver which will allow for future 
compatibility with redis cluster-aware clients. Having this in place now will 
reduce the complexity for future rolling upgrades when cluster support is fully 
provided.

Thanks
--Jens

Re: Proposal to back port GEODE-8938 to 1.14

2021-03-10 Thread Darrel Schneider

+1

From: John Hutchison 
Sent: Wednesday, March 10, 2021 8:46 AM
To: dev@geode.apache.org 
Subject: Proposal to back port GEODE-8938 to 1.14

Hi all,

Proposal to backport GEODE-8938.

Rationale:
GEODE-8864, which has previously been approved to be backported (and is marked 
as a release-blocker) relies on GEODE-8938.  GEODE-8938 is small, does not 
alter any feature code, and exists entirely within the compatible-with-Redis 
area of the code.Additionally, changing GEODE-8864 to not rely on 
GEODE-8938 would require altering a test.

Thanks a lot,
John

[PROPOSAL] backport GEODE-8761 to support/1.14

2021-03-09 Thread Darrel Schneider

GEODE-8761 causes geode to detect threads in a cache server that are stuck 
processing a client request. This can be very helpful in diagnosing hangs. The 
changes to make this happen are rather simple and I'd like to backport them to 
1.14.

Re: [Proposal] BackPort GEODE-8864 to 1.14

2021-03-08 Thread Darrel Schneider

+1

From: John Hutchison 
Sent: Monday, March 8, 2021 12:35 PM
To: dev@geode.apache.org 
Subject: [Proposal] BackPort GEODE-8864 to 1.14

Hi all,

I’m proposing to backport  compatibility with Redis HSCAN command to 1.14.  
This functionality does not rely on code changes outside of the Geode-Redis 
module, and would allow for support for all Hash-related commands in the 
compatible-with-Redis section of Geode.

Thanks,
John

Re: [Proposal] Backport GEODE-8886 to 1.14

2021-03-05 Thread Darrel Schneider

+1

From: Mark Hanson 
Sent: Friday, March 5, 2021 2:48 PM
To: dev@geode.apache.org 
Subject: [Proposal] Backport GEODE-8886 to 1.14

Hi All,

It was discovered that there was a bug with unprocessed events under certain 
conditions with WAN specifically when old and new servers are vying for primary 
status.
This fix will alleviate the problem in 1.14. The code that caused the problem 
is present in 1.14, hence the backport request.

Thanks,
Mark

[PROPOSAL] backport GEODE-8998 to 1.14

2021-03-04 Thread Darrel Schneider

I'm resending this request because my previous request was labelled as junk.

I would like to backport GEODE-8998 to 1.14. If fixes an NPE
that will cause the geode cluster to not be able to do any
cluster messaging if thread monitoring is disabled.
This bug has never been released so it would be nice
keep it that way.

https://issues.apache.org/jira/browse/GEODE-8998

Re: [PROPOSAL]: Add GEODE-9000 as Geode 1.14.0 Blocker

2021-03-04 Thread Darrel Schneider

+1 I agree this should be fixed in 1.14

From: Ju@N 
Sent: Thursday, March 4, 2021 3:30 AM
To: dev@geode.apache.org 
Subject: [PROPOSAL]: Add GEODE-9000 as Geode 1.14.0 Blocker

Hello team,

I'd like to propose adding a newly created ticket, *GEODE-9000 [1]*,
to the *1.14.0
Blocker Board* [2]
The issue has been recently introduced and, long story short, it prevents
members from automatically reconnecting to a distributed system after a
full network split - usage of PERSISTENT regions makes things worse as
other joining members might get stuck waiting to recover the latest data -.
Best regards.

[1]: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9000data=04%7C01%7Cdarrel%40vmware.com%7C097f05a0941e484a210608d8df00e529%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637504542227803335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=4QPV7i1B2dnLSei7AbUE3De6iJDhhr1NtUZtaVd5zyk%3Dreserved=0
[2]:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FDashboard.jspa%3FselectPageId%3D12336052data=04%7C01%7Cdarrel%40vmware.com%7C097f05a0941e484a210608d8df00e529%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637504542227803335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=zc6ngDfBc0rs055Lm3G3C75S9CwucQaAPCJzVn5pE10%3Dreserved=0

--
Ju@N

[PROPOSAL] backport GEODE-8998 to 1.14

2021-03-03 Thread Darrel Schneider

I would like to backport GEODE-8998 to 1.14. If fixes an NPE
that will cause the geode cluster to not be able to do any
cluster messaging if thread monitoring is disabled.
This bug has never been released so it would be nice
keep it that way.

https://issues.apache.org/jira/browse/GEODE-8998

Re: [Proposal] Backport GEODE-8958 into 1.14.x, 1.13.x, 1.12.x branches

2021-03-03 Thread Darrel Schneider

+1

From: Xiaojian Zhou 
Sent: Wednesday, March 3, 2021 9:43 AM
To: dev@geode.apache.org 
Subject: Re: [Proposal] Backport GEODE-8958 into 1.14.x, 1.13.x, 1.12.x branches

+1

On 3/1/21, 11:30 PM, "Owen Nichols"  wrote:

That sounds a lot better than never expiring them if that does happen, I 
think this would be good to include.

On 3/1/21, 2:41 PM, "Mark Hanson"  wrote:

I would like to backport GEODE-8958 into previous release branches to 
alleviate a problem with tombstones if timestamps become corrupt for some 
reason.

Thanks,
Mark

Re: Proposal: new Geode property CRITICAL_HEAP_PERCENTAGE

2021-02-26 Thread Darrel Schneider

I'm not sure what you mean when you say "adding a general 
critical-heap-percentage parameter". The old xsd allowed it to be configured 
using xml. gfsh start server supports it with --critical-heap-percentage. And 
external java apis exist for it as John pointed out. Are you considering adding 
a gemfire property for it? That could be done. You would need to define which 
of them would take precedence if it is specified multiple ways but we already 
do that for other configurable attributes. Or do you mean something else by 
"parameter"?

From: Raymond Ingles 
Sent: Friday, February 26, 2021 11:05 AM
To: dev@geode.apache.org 
Subject: Proposal: new Geode property CRITICAL_HEAP_PERCENTAGE

The Geode Redis Compatibility code is working to emulate the Redis eviction 
policy “noevict”; when memory fills up, the server refuses new write commands 
until deletion or expiration has cleared space.

Right now, part of the implementation is calling 
InternalResourceManager.setCritialHeapPercentage() to ensure that 
LowMemoryExceptions are thrown before heap memory actually runs out. It makes 
sense to make this configurable rather than hardcoding the specific percentage, 
obviously.

Since this is a cache-wide setting (and probably useful to other components) we 
didn’t think it would necessarily be appropriate to add a redis-specific option 
(redis-critical-heap-percentage). What would the implications be to adding a 
general critical-heap-percentage parameter? Are there existing reasons that 
this hasn’t already been done? gfsh already supports these options when 
starting a member.

Does anyone have a strong preference (or aversion) to one or the other?

Re: PR process and etiquette

2020-10-29 Thread Darrel Schneider

+1 for adding a CONTRIBUTING.MD file

From: Sarah Abbey 
Sent: Thursday, October 29, 2020 2:07 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

Regarding knowing who to tag in a PR, because I am working on a very specific 
aspect of Geode, it was frustrating before I was a committer to not be able to 
add reviewers since I knew the handful of people that had enough context to 
review most of the PRs I submitted.  But it is also not hard for me to ask 
these people directly since I work with them every day.  I can imagine it would 
be very frustrating to know exactly who to request a review from, but not be 
able to add them and not be able to directly contact them or get their 
attention.  It would also be nice to be able to request reviews from people who 
are not committers, which we may not be able to change due to GitHub 
limitations. Some of my team members are not committers yet, but I still value 
their input/review of the code even if their review would not count as an 
official approving review. Or if we submit a PR that solves an issue raised by 
a non-committer, it would be nice to have them review it (if they want to/it 
makes sense for them to) to be sure the issue is addressed. Robert has 
mentioned that a workaround is to tag those users in a comment.

Since I only feel like I have significant context in one area of Geode, I 
usually scan the list of PRs for that area unless I'm explicitly tagged as a 
reviewer.  Sometimes, I read other PRs to gain knowledge or if I'm curious, but 
I don't usually add a review.

Regarding waiting until all commit checks are green, it depends on the PR for 
me.  I usually glance at a PR once I'm tagged as a reviewer or I see something 
that needs a review in my area of expertise.  If it looks like the PR still 
needs a lot of work and checks like build did not pass, then I typically wait 
to review it.  If most of the checks have passed, and checks that take a long 
time to run, like DUnit or Acceptance, haven't completed, I will often review 
it.  If certain checks that sometimes have flaky tests are not passing, like 
DUnit, and all other checks are green, I'll often look at those failures to see 
if they are related to the PR at all and check Jira to see if there has been an 
issue filed for them or not.  I'll still review the PR and make a comment about 
the flaky test.  If the failure seems related to the PR, I might still review 
it to see what might've caused the failure.

Whatever we do, our guide to 
contributing
 could definitely use an update.  We might even consider putting a 
CONTRIBUTING.MD file directly in our GitHub repo.  I've found these guides 
useful when contributing to other open source projects.  I also like 
contributing to open source projects when my PR is reviewed timely (within a 
week, though I'm sure we all have different definitions of timely) and any 
feedback or discussion is constructive and kind.

Sarah

From: Donal Evans 
Sent: Thursday, October 29, 2020 3:41 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

As a (relatively) new committer, one of the more difficult aspects of the PR 
process is knowing who to tag as a reviewer. The last person to touch a class 
may not actually have the context or depth of knowledge needed for a thorough 
review if, say, their changes were just refactoring or code cleanup, and if you 
don't have the luxury of working directly with other committers, it's not 
always clear who is the "right" person to ask for review. Add to that the fear 
of overburdening the more knowledgeable committers with endless requests for 
review, and you can find yourself in a position where the reviews you do end up 
getting are somewhat perfunctory or only address surface-level issues rather 
than potential more serious and fundamental problems.

The reverse of this is also something I have struggled with as one of the newer 
and less knowledgeable members of the community, as I'll sometimes see a PR sat 
waiting for review that changes areas of the code that I don't know much about 
and, wanting to help out, make some comments or requests for changes related to 
things like test naming, code style or other mostly cosmetic issues. Once those 
have been addressed, I can approve the PR, but I know that I haven't really 
reviewed it to the standard necessary to have confidence that it's not going to 
break something. On the one hand, I want to be active and help ensure the 
quality of code

Re: PR process and etiquette

2020-10-27 Thread Darrel Schneider

+1 to your idea of using "draft" mode until things are green. Something to be 
aware of is that if your pr branch has conflicts and it is in draft mode then 
your pr tests will not run and the pr page will not tell you that conflicts 
exist. If you see that the pr tests are not actually running and it is in draft 
mode then try merging develop to your pr branch and resolve the conflicts.

From: Owen Nichols 
Sent: Tuesday, October 27, 2020 6:03 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

+1 for using GitHub's draft status to indicate work-in-progress.

Many great suggestions here, however I generally prefer that we don't squash 
commits at any point except the final Squash and Merge to develop.  I find it 
insightful to see how the work evolved.  I also find that review comments may 
start coming in even before you are "ready" for review, and a squash or 
force-push "loses" those comments.

One thing I would like to see more of is PR summaries that explain *why* the 
change is being made, not just *what* is being changed.

Thanks Udo for looking for ways to make the community process work even better!

On 10/27/20, 5:41 PM, "Udo Kohlmeyer"  wrote:

Dear Apache Geode Devs,
It is really great going through all the PRs that been submitted. As Josh 
Long is known to say: "I work for PRs".
Whilst going through some of the PRs I do see that there are many PRs that 
have multiple commits against the PR.
I know that the PR submission framework kicks off more testing than we do 
on our own local machines. It is extremely uncommon to submit a PR the first 
time and have all tests go green. Which means we invariably iterate over 
commits to make the build go green.
In this limbo time period, it is hard for any reviewer to know when the 
ticket is ready to be reviewed.
I want to propose that when submitting a PR, it is initially submitted as a 
DRAFT PR. This way, all test can still be run and work can be done to make sure 
"green" is achieved. Once "green" status has been achieved, the draft can then 
be upgraded to a final PR by pressing the "Ready For Review" button. At this 
point all commits on the branch can then once again be squashed into a single 
commit.
Now project committers will now know that the PR is in a state that it can 
be reviewed for completeness and functionality.
In addition, it will help tremendously helpful if anyone submitting a PR 
monitors their PR for activity. If there is no activity for a few days, please 
feel free to ping the Apache Geode Dev community for review. If review is 
request, please prioritize some time to address the feedback, as reviewers 
spend time reviewing code and getting an understanding what the code is doing. 
If too much time goes by, between review and addressing the review comments, 
not only does the reviewer lose context, possibly requiring them to spend time 
again to understand what the code was/is supposed to do, but also possibly lose 
interest, as the ticket has now become cold or dropped down the list of PRs.
There are currently many PRs that are in a cold state, where the time 
between review and response has been so long, that both parties (reviewer and 
submitter) have forgotten about the PR.
In the case that the reviews will take more time to address than expected, 
please downgrade the PR to draft status again. At this point, it does not mean 
that reviewers will not be able to help anymore, as you can request a reviewer 
to help with feedback and comments, until one feels that the PR is back in a 
state of final submission.
So, what I'm really asking from the Dev Community:
If you submit a PR, it would be great if you can nudge the 
community if there is no review on the PR. If feedback is provided on a PR, 
please address it as soon as possible. This not only will help the PR progress 
faster, but it will shorten the feedback loop.
Finally, start with a DRAFT PR and then upgrade to a final PR when 
you feel it is in a "good" state. If more work is required, it is ok to 
downgrade back to a draft, until one feels the PR is in a good state again.
Managing the PR process in this manner, will be hugely beneficial for all 
community members. As reviewers will know when a PR is ready to be reviewed. 
Reviewers will also feel more engaged in this process, due to timely response 
to their review comments. PR submitters will feel happier, in the knowledge 
that the code they spent so long meticulously crafting will possibly make it 
into the project.
Any thoughts?
--Udo

Re: [PROPOSAL] Backport GEODE-8608 to support 1.13, 1.12 branch

2020-10-14 Thread Darrel Schneider

+1

From: Anilkumar Gingade 
Sent: Wednesday, October 14, 2020 1:34 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Backport GEODE-8608 to support 1.13, 1.12 branch

+1
After the PR pipeline is completed.

-Anil.

On 10/14/20, 1:32 PM, "Xiaojian Zhou"  wrote:

Hi,

There’s a race that StateFlush could hang when the target member is 
shutdown. GEODE-8608 fixed. This fix is a patch to GEODE-8385.

The fix should be backported to all previous versions with GEODE-8385.

We are still waiting for prechecking to finish.

Regards
Xiaojian Zhou

Please give me access to debug concourse runs

2020-08-18 Thread Darrel Schneider

I would like to use "fly" to login to concourse and look at some test runs that 
are hanging. Could I be given access please?
Thanks

Re: TimeIntegrationTest is flaky

2020-08-13 Thread Darrel Schneider

Will do. Looks like an easy fix

From: Kirk Lund 
Sent: Wednesday, August 12, 2020 4:25 PM
To: dev@geode.apache.org 
Subject: TimeIntegrationTest is flaky

Since this is a new test, can we please prioritize fixing it?

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8379data=02%7C01%7Cdarrel%40vmware.com%7C8e263944d2074e05041108d83f16f8de%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637328715184549707sdata=HP%2B%2BJoLbnPLEFgIh%2F8qu0kfmJwq40autCLBJEXT1ve4%3Dreserved=0

java.lang.AssertionError:
Expecting:
 <0L>
to be greater than:
 <0L>
at
org.apache.geode.redis.internal.executor.server.TimeIntegrationTest.timeCommandRespondsWIthTwoValues(TimeIntegrationTest.java:57)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at
org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at

Re: [PROPOSAL] Backport GEODE-8423 to support/1.13

2020-08-12 Thread Darrel Schneider

+1

From: Sarah Abbey 
Sent: Wednesday, August 12, 2020 8:59 AM
To: dev@geode.apache.org 
Subject: [PROPOSAL] Backport GEODE-8423 to support/1.13

The documentation for the Redis API for Geode that will currently be published 
for 1.13 is not accurate and needs to be updated so potential users searching 
for information about Redis API for Geode will get accurate information.

[PROPOSAL] backport GEODE-6564 to support/1.13

2020-08-03 Thread Darrel Schneider

GEODE-6564 is a memory leak that has impacted users so it would be good to have 
it in 1.13. It has a simple, low risk, fix.

Re: [PROPOSAL] Postpone Geode 1.14

2020-07-30 Thread Darrel Schneider

+1

From: Alexander Murmann 
Sent: Tuesday, July 28, 2020 4:34 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL] Postpone Geode 1.14

Hi all,

As mentioned on the previous discuss thread, I propose to hold off cutting
1.14 until we have shipped 1.13.

Once we have shipped 1.13, we should discuss when we want to cut the 1.14
release. The actual ship date for Geode 1.13 is important information for
that conversation. Thus we cannot have that conversation before then.

Re: Proposal to bring GEODE-8016 to support branches

2020-05-11 Thread Darrel Schneider

+1

On Sun, May 10, 2020 at 10:05 PM Dick Cavender  wrote:

> +1
>
> On Sat, May 9, 2020 at 6:42 PM Owen Nichols  wrote:
>
> > Last week develop was successfully migrated from publishing -SNAPSHOT to
> > publishing -build.nnn, as discussed on the dev list.
> >
> > I propose that we bring the same change to support/1.13 and support/1.12
> > for consistency.  This will allow downstream projects to change over the
> > new model “everywhere" rather than having to maintain support for both
> ways
> > and constantly try to remember which branches do it which way.
> >
> > The specific changes to be cherry-picked from develop are a4c8b <
> >
> https://github.com/apache/geode/commit/a4c8b9ed8bbea584f798164fa5308d236e9b6048
> >
> > and 39c52 <
> >
> https://github.com/apache/geode/commit/39c522e340196cb30d55d81d93c63028938cd782
> >.
> > I have prepared PR #5089  for
> > support/1.13 and PR #5090 
> for
> > support/1.12 due to the version number differences on each branch.
>

Re: Discussion - Change Public API Before Initial Release

2020-05-09 Thread Darrel Schneider

+1

On Fri, May 8, 2020 at 11:03 PM Dick Cavender  wrote:

> +1
>
> On Fri, May 8, 2020 at 7:59 PM Owen Nichols  wrote:
>
> > +1
> >
> > > On May 8, 2020, at 7:57 PM, Robert Houghton 
> > wrote:
> > >
> > > +1
> > >
> > > On Fri, May 8, 2020, 18:26 Jacob Barrett  wrote:
> > >
> > >> Hey Ya’ll,
> > >>
> > >> We have a new API going into 1.13 that has an inconsistency I want to
> > >> address before we are stuck with it. The class SniSocketFactory should
> > be
> > >> renamed SniProxySocketFactory. The class in question produces objects
> of
> > >> type SniProxySocket. An "SNI socket" isn’t a thing but an SNI proxy
> is a
> > >> thing. The factory class that produces proxy socket factories (yes a
> > meta
> > >> factory) ProxySocketFactories uses the “ProxySocket” name as well. It
> > fells
> > >> very inconsistent with all the other classes that make up with API for
> > >> SniSocketFactory to not have a proxy in it and be called
> > >> SniProxySocketFactory.
> > >>
> > >> To be very clear, this API has not made it out of development yet but
> is
> > >> in the 1.13 release branch. Can I get a few thumbs up to open an
> ticket
> > and
> > >> pick it into the 1.13 release branch before it goes out?
> > >>
> > >> Thanks,
> > >> Jake
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] Geode Redis API Improvements

2020-05-07 Thread Darrel Schneider

Experimental features are subject to change or removal. No guarantees are
made of backwards compatibility.

On Thu, May 7, 2020 at 2:53 PM Donal Evans  wrote:

> Looks good to me. Fixing broken implementation and providing reliable HA
> can only be a good thing. My only concern is the backwards compatibility
> issue, but I don't know if we make any guarantees regarding that for
> experimental features, so it may be a non-issue.
>
> On Thu, May 7, 2020 at 8:35 AM Raymond Ingles  wrote:
>
> > Just a quick nudge - we set a deadline for comments for tomorrow. Anyone
> > had a chance to look this over yet? Thanks!
> >
> > On Thu, Apr 30, 2020 at 2:09 PM Raymond Ingles 
> wrote:
> >
> > > Hi all,
> > >
> > > The current Redis API support in Geode has been marked experimental
> for a
> > > couple years and has several bugs and inconsistencies compared to
> native
> > > Redis. We created a RFC for updating the Geode Redis API support, to
> > > address issues in the implementation, improve performance and add High
> > > Availability support.
> > >
> > > Please review and comment by 5/8/2020.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/GEODE/Geode+Redis+API+Improvements
> > >
> > > Thanks!
> > >
> >
>

Re: RFC - Logging to Standard Out

2020-01-08 Thread Darrel Schneider

+1

On Wed, Jan 8, 2020 at 12:39 PM Jacob Barrett  wrote:

> Please see RFC for Logging to Standard Out.
>
> https://cwiki.apache.org/confluence/display/GEODE/Logging+to+Standard+Out
>  >
>
> Please comment by 1/21/2020.
>
> Thanks,
> Jake
>
>

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-13 Thread Darrel Schneider

The v2 management api allows regions to be created remotely in cluster
configuration and on running servers. But it does not allow you to create a
region on a client. Non-spring applications can use RegionFactory to create
the region on the client side.


On Fri, Dec 13, 2019 at 9:56 AM Dan Smith  wrote:

> +1 to adding a way to copy the RegionAttributes.
>
> BTW, I really wish this RegionFactory was an interface. I don't know if
> adding a copy constructor makes it harder to migrate to an interface later,
> but maybe just having this single public method might be better?
>
> +   RegionFactory createRegionFactory(RegionFactory
> regionFactory);
>
> On Fri, Dec 13, 2019 at 9:35 AM Mark Hanson  wrote:
>
> > Hi Udo,
> >
> > I disagree. I believe the currently published best practice is to use
> > RegionFactory. As a part of the work I have been doing,  I have been
> > migrating code from the AttributesFactory pattern to the RegionFactory
> > pattern. To support that, I believe a copy constructor for RegionFactory
> is
> > necessary. And thus a createRegionFactory
> >
> > Hence my changes.
> >
> > Thanks,
> > Mark
> >
> > > On Dec 11, 2019, at 4:41 PM, Udo Kohlmeyer 
> > wrote:
> > >
> > > I think at this point I'd be looking at the new V2 Management API's for
> > Regions.
> > >
> > > I think any new "public" effort that we'd be adding to the product
> > should be done through the Management API's for Regions, rather than
> > exposing new public API's that in reality should not be made "public".
> > >
> > > --Udo
> > >
> > > On 12/11/19 3:53 PM, Mark Hanson wrote:
> > >> Basically the point is to allow a use to copy a RegionFactory because
> > under certain circumstances it is necessary. I found that when migrating
> > from AttributesFactory.
> > >>
> > >> Thanks,
> > >> Mark
> > >>
> > >>> On Dec 11, 2019, at 3:48 PM, Anthony Baker 
> wrote:
> > >>>
> > >>> Mark,
> > >>>
> > >>> Can you share how the API changes will help the user?
> > >>>
> > >>> Thanks,
> > >>> Anthony
> > >>>
> > >>>
> >  On Dec 11, 2019, at 2:57 PM, Mark Hanson 
> wrote:
> > 
> >  Hi All,
> > 
> >  There was a suggestion that since I am making a couple user visible
> > API changes that I might want to ask the dev list.
> > 
> >  Basically I was migrating code from AttributesFactory to
> > RegionFactory and hit a few snags along the way.
> > 
> >  Please see https://github.com/apache/geode/pull/4409 <
> > https://github.com/apache/geode/pull/4409> specifically Cache.java,
> > RegionFactory.java, and for extra credit GemfireCacheImpl.java
> > 
> >  I have commented at the relevant changes.
> > 
> >  Thanks,
> >  Mark
> >
> >
>

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Darrel Schneider

Don't expose "InternalCache" on RegionFactory. You probably want it to be
"Cache".

On Wed, Dec 11, 2019 at 3:35 PM Mark Hanson  wrote:

>
> In Cache.java
>
> +   RegionFactory createRegionFactory(RegionFactory
> regionFactory);
>
> In RegionFactory.java
>  + public RegionFactory(InternalCache cache, RegionFactory
> regionFactory)
> and
> + public RegionFactory(RegionFactory regionFactory)
>
> Lastly in GemfireCacheImpl.java
> + public  RegionFactory createRegionFactory(RegionFactory V> regionFactory)
>
> Thanks,
> Mark
>
>
> > On Dec 11, 2019, at 3:25 PM, Dan Smith  wrote:
> >
> > I see a lot of PR comments on those PRs. What is the new API you added?
> >
> > -Dan
> >
> > On Wed, Dec 11, 2019 at 2:57 PM Mark Hanson  wrote:
> >
> >> Hi All,
> >>
> >> There was a suggestion that since I am making a couple user visible API
> >> changes that I might want to ask the dev list.
> >>
> >> Basically I was migrating code from AttributesFactory to RegionFactory
> and
> >> hit a few snags along the way.
> >>
> >> Please see https://github.com/apache/geode/pull/4409 <
> >> https://github.com/apache/geode/pull/4409> specifically Cache.java,
> >> RegionFactory.java, and for extra credit GemfireCacheImpl.java
> >>
> >> I have commented at the relevant changes.
> >>
> >> Thanks,
> >> Mark
>
>

Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Darrel Schneider

+1

On Thu, Dec 5, 2019 at 4:40 PM Dan Smith  wrote:

> Hi,
>
> I wrote up a proposal for deprecating of the singleton PoolManager in favor
> of a ClientCache scoped service. Please review and comment on the below
> proposal.
>
> I think this should address the issues that Spring Data Geode and friends
> had trying to mock Pools and remove the need for those projects to try to
> inject mock Pools into a Geode singleton.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/Replace+singleton+PoolManager+with+ClientCache+scoped+service
>
> Thanks,
> -Dan
>

Re: bug fix needed for release/1.11.0

2019-11-07 Thread Darrel Schneider

+1 to include the fix

On Thu, Nov 7, 2019 at 8:54 AM Bruce Schuchardt 
wrote:

> The change passed in SSL benchmark testing and can be cherry-picked into
> the release branch.  The Benchmark job as a whole failed due to perf
> degradation with the Security Manager, but that's a different set of tests.
>
>
> On 11/6/19 3:57 PM, Helena Bales wrote:
> > +1 to cherry-picking the fix.
> >
> > The sha hasn't made it to benchmarks yet due to an issue with CI losing
> > resource refs that were needed to keep it moving through the pipeline.
> The
> > next commit is still about an hour away from triggering benchmarks.
> > In my manual benchmarking of this change, I found that it resolved the
> > issue with SSL and passed the benchmarks. Obviously we still need to
> > confirm that it works through the main pipeline, but I feel confident
> that
> > it will pass the benchmark job.
> >
> > Thanks,
> > Helena Bales (they/them)
> >
> > On Wed, Nov 6, 2019 at 9:28 AM Mark Hanson  wrote:
> >
> >> Any other votes? I have 2 people in favor.
> >>
> >> Voting will close at noon.
> >>
> >> Thanks,
> >> Mark
> >>
> >>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt 
> >> wrote:
> >>> The fix for this problem is in the CI pipeline today:
> >>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
> >>> On 11/5/19 10:49 AM, Owen Nichols wrote:
>  +1 for bringing this fix to release/1.11.0 (after it has passed
> >> Benchmarks on develop)
> > On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt <
> bschucha...@pivotal.io>
> >> wrote:
> > The PR for GEODE-6661 introduced a problem in SSL communications that
> >> needs to be fixed.  It changed SSL handshakes to use a temporary buffer
> >> that's discarded when the handshake completes, but sometimes this buffer
> >> contains application data that must be retained.  This seems to be
> causing
> >> our Benchmark SSL test failures in CI.
> > I'm preparing a fix.  We can either revert the PR for GEODE-6661 on
> >> that branch or cherry-pick the correction when it's ready.
> >>
>

Re: [vote/discuss]Override stressNewTest for Pull Request #4250?

2019-10-31 Thread Darrel Schneider

+1

On Thu, Oct 31, 2019 at 4:16 PM Jinmei Liao  wrote:

> +1
>
> On Thu, Oct 31, 2019, 3:30 PM Xiaojian Zhou  wrote:
>
> > I'm curious to see the new stressNew test result too.
> >
> > On Thu, Oct 31, 2019 at 3:26 PM Owen Nichols 
> wrote:
> >
> > > I’ve retriggered StressNew <
> > >
> >
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/4758
> > >
> > > with a temporarily-increased timeout of 12 hours so we can see how long
> > it
> > > would actually take, to have some data point whether to propose a
> > permanent
> > > timeout increase or whether breaking up into multiple PRs is should be
> > the
> > > standard way to get around this.
> > >
> > > > On Oct 31, 2019, at 2:52 PM, Donal Evans  wrote:
> > > >
> > > > +1 to allowing this PR to be merged, although I'd lean strongly
> toward
> > > > facilitating this by temporarily increasing the timeout on the job to
> > > allow
> > > > it to actually pass rather than a manual override of the
> StressNewTest.
> > > >
> > > > The fact that it's passed over 7000 times without failing is pretty
> > > strong
> > > > evidence that it's not a flaky test, which is what StressNewTest is
> > > > supposed to catch, so there doesn't seem to be any risk associated
> with
> > > > circumventing it in this case, but if there's a feasible solution
> that
> > > > doesn't involve "cheating" or ignoring the test job, then that would
> be
> > > > preferable.
> > > >
> > > > - Donal
> > > >
> > > > On Thu, Oct 31, 2019 at 2:04 PM Jason Huynh 
> wrote:
> > > >
> > > >> Greetings,
> > > >>
> > > >> We have a pull request (https://github.com/apache/geode/pull/4250)
> > > that is
> > > >> running into a problem with stressNewTest.  Mostly the tests that
> are
> > > being
> > > >> run are RollingUpgrade tests that take quite a bit of time to run
> the
> > > full
> > > >> suite.  Because these tests are added/modified, the stressNewTest
> > > doesn't
> > > >> have enough time to complete the run because it runs them N(50)
> number
> > > of
> > > >> times.
> > > >>
> > > >> However what has completed is 7400 tests and none of them have
> failed:
> > > >>
> > > >>
> > >
> >
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4250/test-results/repeatTest/1572546653/
> > > >>
> > > >> We would like to get this fix in before branching the next release,
> > but
> > > are
> > > >> unable to due to stressNewTest gating the merge button.  I know we
> > have
> > > >> another thread about overrides etc, and maybe this is a data point,
> > but
> > > >> this isn't meant to discuss that.
> > > >>
> > > >> Would everyone be able to agree to allow someone to manually
> override
> > > and
> > > >> merge this commit in (title of PR and reviews pending)?
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] Tweak to branch protection rules

2019-10-29 Thread Darrel Schneider

+1

On Tue, Oct 29, 2019 at 6:08 PM Owen Nichols  wrote:

> +1 …this has already bitten me a few times
>
> > On Oct 29, 2019, at 6:01 PM, Dan Smith  wrote:
> >
> > Hi all,
> >
> > It seems we've configured our branch protection rules such that pushing a
> > change to a PR that has been approved invalidates the previous approval.
> >
> > I think we should turn this off - it looks like it's an optional feature.
> > We should trust people to rerequest reviews if needed. Right now this is
> > adding busywork for people to reapprove minor changes (Fixing merge
> > conflicts, spotless, etc.)
> >
> > If you all agree I'll ask infra to uncheck "Dismiss stale pull request
> > approvals when new commits are pushed." in our branch protection rules.
> >
> > -Dan
>
>

Re: [DISCUSS] Blocking merge button in PR

2019-10-22 Thread Darrel Schneider

I would advise against including "StressNewTestOpenJDK11".
I have had trouble with this one when doing something as simple as a method
rename.
Because that method was called from an old test, StressNewTest ran that old
test many times. Not all of our current tests can handle being rerun many
times. If all you are trying to do in a pull request is something simple
you should not be required to rework a test to survive StressNewTest.
If StressNewTest was changed to only run brand new test files then I would
be okay with it gating a PR merge.

On Mon, Oct 21, 2019 at 4:41 PM Bruce Schuchardt 
wrote:

> I'd still like to see the PR rerun tool or an equivalent available to
> everyone.  Sure you can push an empty commit but that reruns everything,
> but the PR tool lets you rerun only the checks that failed.
>
> On 10/21/19 3:04 PM, Ernest Burghardt wrote:
> > +1
> > @Naba thanks for seeing this through!
> >
> > On Mon, Oct 21, 2019 at 1:51 PM Nabarun Nag  wrote:
> >
> >> Thank you all for all the valuable feedback. Robert was kind enough to
> >> check the technical feasibility of the integration of Github Branch
> >> Protection Rules and its compatibility with our concourse CI checks and
> we
> >> are satisfied with the settings provided and the initial tests that
> Robert
> >> did with a demo geode branch. Please find attached the screenshot of the
> >> settings that will be sent to INFRA for enabling it on the Apache Geode
> >> repository.
> >>
> >> Regards
> >> Nabarun
> >>
> >> On Mon, Oct 21, 2019 at 12:21 PM Aaron Lindsey 
> >> wrote:
> >>
> >>> +1
> >>>
> >>> - Aaron
> >>>
> >>>
> >>> On Mon, Oct 21, 2019 at 11:53 AM Udo Kohlmeyer  wrote:
> >>>
>  BIG +1 (Yes, I'm changing my -1)
> 
>  @Naba, thank you for the offline chat. It seems that the proposal of
>  Github enforcing our good practices is a great option.
> 
>  2 merged PR's without a full green pipeline, since 18 Oct 2019,
> shocking
>  and disgusting!!
> 
>  I would just like to reiterate to ALL committers, you have a
>  responsibility towards the project to commit/merge only the best code
>  possible. If things break, fix them. Don't merge and hope that it goes
>  away and someone else fixes it.
> 
>  I really don't want to support a mechanism where the project has
> become
>  so prescriptive and restrictive, all because we have a few committers
>  who will not follow the established and agreed processes. BUT, once
>  again, the masses are impacted due to a few.
> 
>  Thank you Naba for following this through.
> 
>  --Udo
> 
>  On 10/21/19 11:05 AM, Nabarun Nag wrote:
> > @Karen
> >
> > Thank you so much for the feedback. I understand that committers must
>  trust
> > each other and I agree entirely with that. This proposal is just
>  preventing
> > us from making mistakes. Its just guard rails. As you can see in the
>  email
> > chain that this mistake was made multiple times and this has let to a
> >>> lot
> > of engineers give up their time and detecting and fixing this issue.
> >>> We
> > still trust our committers, we are just enabling some features to
> >>> avoid
> > time being wasted on avoidable mistakes and use that time in
> > improving Geode. I hope I can persuade you to change your opinion.
> >
> > @Owen
> > Thank you for accepting the baby step proposal. Yes, let's see in
> some
>  time
> > if this is successful. We can always undo this if needed.
> >
> > @Rest
> > - Blaming people etc. will be very detrimental to the community, I do
> >>> not
> > propose that.
> > - This proposal was not just my idea but from feedback from lot of
> > developers who contribute to Geode on a frequent basis.
> > - It is very* unfair *to the engineers to go and investigate and find
> >>> out
> > what caused the failure and then send emails etc, their time can be
> >>> used
>  in
> > doing something more valuable.
> > - This is not something new, we have had this issue for over 3 years
> >>> now,
> > and maintaining this as the status quo is not healthy. Let us try
> this
> > solution, the committers, and developers who contribute on a regular
>  basis
> > will immediately see if it works or does not work and can revert it
> if
>  this
> > does not.
> > - There is a problem and this is an attempt at the solution. If it
> >>> does
>  not
> > work, we can revert it. For the sake of all the developers who are in
>  pain
> > and helping them spend the time on more productive tasks, let us just
>  give
> > this proposal a chance. If there are any issues we can revert it.
> >
> >
> > *Reiterating the proposal:*
> > Github branch protection rule for :
> > - at least one review
> > - Passing build, unit and stress test.
> >
> >
> > In our opinion, no committer would want

Re: Off-heap support deactivation

2019-10-02 Thread Darrel Schneider

You can change it but only while the servers hosting that region are
stopped. In the pre-cluster-config world, users did this by stopping the
servers, editing their cache.xml (and altering the disk-store) and then
starting the servers. But as my previous email said, it can be hard to do
with gfsh.

On Wed, Oct 2, 2019 at 1:44 AM Alberto Bustamante Reyes
 wrote:

> thanks for the detailed explanation Darrel.
>
> There is something I did not get: "alter disk-store" is used to align the
> region info the disk store has when a region attribute is changed. But
> off-heap support is an attribute that cannot be changed on a region after
> it is created. So then, what is the utility of being able to run "alter
> disk-store" with "--off-heap" parameter if that is something that will not
> change in the region?
>
> 
> De: Darrel Schneider 
> Enviado: lunes, 30 de septiembre de 2019 17:51
> Para: dev@geode.apache.org 
> Asunto: Re: Off-heap support deactivation
>
> You can specify this setting at the time you create the region. Geode does
> not have support for changing it on a region that already exists. Only a
> few region attributes can be changed on a region that currently exists (see
> the AttributesMutator API). So how is your region getting created? I think
> it is probably from cluster configuration. So what you would need to do is
> get the definition stored in cluster configuration. I don't think the gfsh
> alter region command will let you change this attribute (alter region uses
> AttributesMutator). So you either need to delete the current definition and
> then create it again or you need to edit the current definition manually.
> Using gfsh to destroy and create is the cleanest solution, but that will
> also blow away the data you currently have persisted.
> To change it manually you can use gfsh export to get your cluster config as
> xml, edit the xml to change the offheap boolean property on the region, and
> then use gfsh import to load the xml you edited. This requires that the
> server are restarted.
> If you are not using cluster config (I think you should be) then this is
> actually easier. You either just edit your cache.xml file and restart the
> server that is using it or you just change your code's use of RegionFactory
> to create the region differently.
>
> The whole alter disk-store thing is just an optimization. The region
> attributes stored in the disk-store for a persistent region do not
> determine how the region is configured. The cluster-config/xml/apis that
> create the region do that. When a disk-store is initially loaded it does
> not yet know how the regions are configured. But it creates some temporary
> maps that are used later once the region is created. If the attributes
> stored in the disk-store match those on the region configuration then the
> region initialization will be faster and use less memory. So basically if
> you do have a persistent region and then change how it is configured, if
> you also then alter how it is configured on the disk-store you next restart
> will recover faster. If you don't do the alter disk-store the first
> recovery will be slower but the actual region config will be stored again
> in the disk-store and subsequent recoveries will be fast.
>
> On Mon, Sep 30, 2019 at 8:28 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi all,
> >
> > Is it possible to change the off-heap support of a region once it is
> > created? The idea I got from documentation is that it is possible to do
> it
> > if the region is persistent, as the off-heap flag of the region can be
> > changed using "alter disk-store".
> >
> > I have run the following example to check it: with two servers, I created
> > a partition persistent region, with off-heap set to true. Then I
> > deactivated the off-heap support by using alter disk-store, as described
> in
> > documentation. But I have observed that if I run "describe region", the
> > off-heap flag is still set to true. And if I populate entries, the values
> > are stored in the off-heap memory.
> >
> > Did I misunderstood the documentation or I did something wrong?
> >
> > Thanks in advance,
> >
> > Alberto B.
> >
> >
> > PD: I wrote down the steps I followed in the following gist:
> > https://gist.github.com/alb3rtobr/e1fcf4148fe46f2e7b9e02a2e458624c
> >
>

Re: Off-heap support deactivation

2019-09-30 Thread Darrel Schneider

You can specify this setting at the time you create the region. Geode does
not have support for changing it on a region that already exists. Only a
few region attributes can be changed on a region that currently exists (see
the AttributesMutator API). So how is your region getting created? I think
it is probably from cluster configuration. So what you would need to do is
get the definition stored in cluster configuration. I don't think the gfsh
alter region command will let you change this attribute (alter region uses
AttributesMutator). So you either need to delete the current definition and
then create it again or you need to edit the current definition manually.
Using gfsh to destroy and create is the cleanest solution, but that will
also blow away the data you currently have persisted.
To change it manually you can use gfsh export to get your cluster config as
xml, edit the xml to change the offheap boolean property on the region, and
then use gfsh import to load the xml you edited. This requires that the
server are restarted.
If you are not using cluster config (I think you should be) then this is
actually easier. You either just edit your cache.xml file and restart the
server that is using it or you just change your code's use of RegionFactory
to create the region differently.

The whole alter disk-store thing is just an optimization. The region
attributes stored in the disk-store for a persistent region do not
determine how the region is configured. The cluster-config/xml/apis that
create the region do that. When a disk-store is initially loaded it does
not yet know how the regions are configured. But it creates some temporary
maps that are used later once the region is created. If the attributes
stored in the disk-store match those on the region configuration then the
region initialization will be faster and use less memory. So basically if
you do have a persistent region and then change how it is configured, if
you also then alter how it is configured on the disk-store you next restart
will recover faster. If you don't do the alter disk-store the first
recovery will be slower but the actual region config will be stored again
in the disk-store and subsequent recoveries will be fast.

On Mon, Sep 30, 2019 at 8:28 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> Is it possible to change the off-heap support of a region once it is
> created? The idea I got from documentation is that it is possible to do it
> if the region is persistent, as the off-heap flag of the region can be
> changed using "alter disk-store".
>
> I have run the following example to check it: with two servers, I created
> a partition persistent region, with off-heap set to true. Then I
> deactivated the off-heap support by using alter disk-store, as described in
> documentation. But I have observed that if I run "describe region", the
> off-heap flag is still set to true. And if I populate entries, the values
> are stored in the off-heap memory.
>
> Did I misunderstood the documentation or I did something wrong?
>
> Thanks in advance,
>
> Alberto B.
>
>
> PD: I wrote down the steps I followed in the following gist:
> https://gist.github.com/alb3rtobr/e1fcf4148fe46f2e7b9e02a2e458624c
>

Re: Backward compatibility issue in 1.10

2019-09-19 Thread Darrel Schneider

I agree that it is not needed in 1.10 since this interface is not meant to
be implemented by users

On Thu, Sep 19, 2019 at 5:25 AM Jacob Barrett  wrote:

> Good find. I agree that it’s fine since it’s not a user implemented
> interface.
>
> > On Sep 19, 2019, at 1:20 AM, Alberto Bustamante Reyes
>  wrote:
> >
> > Hi,
> >
> > During PR review of GEODE-6871 it was found that GEODE-5222 introduced a
> backward compatibility issue by adding a new method to a public interface
> without providing a default implementation.
> > According to comments in the PR, although the impacted interface
> (DiskStoreMXBean) is public, it should not be implemented by applications,
> so the risk of breaking backward compatibility is low, but it exists.
> >
> > Do you think this issue should be fixed in 1.10?
> >
> > BR/
> >
> > Alberto
>

Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-10 Thread Darrel Schneider

I think it would be good for stop server to confirm in some way that the
server has stopped before returning.

On Tue, Sep 10, 2019 at 3:08 PM Mark Hanson  wrote:

> Hello All,
>
> I would like to propose that we make the gfsh “stop server” command
> synchronous. It is causing some issues with some tests as the rest of the
> calls are blocking. Stop on the other hand immediately returns by
> comparison.
> This causes issues as shown in GEODE-7017 specifically.
>
> GEODE:7017 CI failure:
> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
> startupReportsOnlineOnlyAfterRedundancyRestored
> https://issues.apache.org/jira/browse/GEODE-7017 <
> https://issues.apache.org/jira/browse/GEODE-7017>
>
>
> What do people think?
>
> Thanks,
> Mark

Re: [DISCUSS] RFC - Move membership code to a separate gradle sub-project

2019-09-05 Thread Darrel Schneider

+1 I added some comments on the wiki

On Thu, Sep 5, 2019 at 9:34 AM Kirk Lund  wrote:

> *interfaces -> was supposed to be "implementations"
>
> On Thu, Sep 5, 2019 at 9:33 AM Kirk Lund  wrote:
>
> > +1 The planned subprojects look good. Thanks for clarifying the
> > goals/anti-goals in the RPC, especially the anti-goal to not make it
> > pluggable with different interfaces.
> >
> > On Thu, Sep 5, 2019 at 8:52 AM Aaron Lindsey 
> wrote:
> >
> >> +1 — I'm happy to see us move toward better testability for the
> membership
> >> code!
> >>
> >> I also left my "+1" on the wiki page comments. I noticed that the
> >> "Lightweight RFC Process" document states that we're supposed to have
> >> discussions on the [DISCUSS] thread:
> >>
> >> In addition a [DISCUSS] email should be send to the dev mailing list.
> >> > Discussion will take place in the email thread.
> >> >
> >>
> >> Can we clarify how to have discussions on proposals? Personally, I don't
> >> care either way as long as it's clear how the process is supposed to
> work.
> >>
> >> - Aaron
> >>
> >>
> >> On Wed, Sep 4, 2019 at 4:04 PM Bruce Schuchardt  >
> >> wrote:
> >>
> >> > Thanks to folks who put in review comments today.  We're shooting for
> >> > closing this RFC this week.
> >> >
> >> >
> >> > On 8/30/19 3:50 PM, Dan Smith wrote:
> >> > > Hi all,
> >> > >
> >> > > We added the following RFC to the wiki about moving Geode's
> membership
> >> > > system to a separate gradle sub-project. Please review and comment
> by
> >> > > 9/6/2019.
> >> > >
> >> > > https://cwiki.apache.org/confluence/x/WRB4Bw
> >> > >
> >> > > Thanks!
> >> > > -Dan
> >> > >
> >> >
> >>
> >
>

[DISCUSSION] should the DISTRIBUTED_NO_ACK Scope be deprecated?

2019-08-29 Thread Darrel Schneider

Geode currently allows creating non-partitioned regions with a
DISTRIBUTED_NO_ACK Scope. This causes distributed operations on the region
not to wait for acknowledgement but to just send the op and assume the
remote member received it.
Currently gfsh gives you no way to create a region DISTRIBUTED_NO_ACK. It
always uses DISTRIBUTED_ACK.
Partition regions do not support no-ack.
Does anyone know of a reason why it should not be deprecated so that it can
be removed in some future release?

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-21 Thread Darrel Schneider

Given the current types that gfsh supports, I don't see that it will be
lots of validation. We have the redundancy level to validate for the
REDUNDANT types. And we have HEAP_LRU and OVERFLOW to validate. That is it.
The only other thing is the partition only attributes should be rejected
when creating a replicate. I think we should just support the types "gfsh
create region" does.

On Tue, Aug 20, 2019 at 7:50 PM Jinmei Liao  wrote:

> Maybe to clarify what I said before (which was certainly unclear ), I am
> not saying we only allow users to create limited set of types of region, we
> COULD still support all of them, but we should not limit ourselves to the
> current ways of doing things. Currently, a flat model of region
> configuration which holds a type and all the possible attributes any type
> of region can have, could lead to these problems:
> 1. the type may be contradictory/redundant to a particular property:
>   for example, the type could "PARTITION_REDUNDANT", but the
> redundantCopies is set to be 0. And the type could be "PARTITION_OVERFLOW",
> but the eviction action is set to be "LOCAL_DESTROY". To avoid this, we
> will need to do a lot of front end validation in order to accept this
> configuration, which is possible to do, but is it really necessary? Should
> we somehow limit the type to something that really indicate the "type"
> instead of just a shortcut that actually sets a set of attributes.
>
> 2. the object can hold lots of unnecessary attributes that only pertains to
> a particular type
> for example, redundantCopies doesn't really apply to replicate regions, but
> it's there for you to configure.
>
> Just some more food for thought.
>
>
>
>
> On Tue, Aug 20, 2019 at 7:34 PM Charlie Black  wrote:
>
> > Yes it is common for 0,1 and 2.   3 enters into gray space of is the cost
> > of redundancy worth it.
> >
> > So voting for exposing the number of copies to be the same as Apache
> Geode
> > Java API.
> >
> > Charlie
> >
> > On Tue, Aug 20, 2019 at 11:38 AM Darrel Schneider  >
> > wrote:
> >
> > > The shortcuts support partitioned regions with 0 and 1 redundant
> copies.
> > Is
> > > redundancies greater than 1 common enough for the rest management api
> to
> > > support it?
> > >
> > > On Tue, Aug 20, 2019 at 11:27 AM Jacob Barrett 
> > > wrote:
> > >
> > > > +1 to Alexander’s statement.
> > > >
> > > > Also, initial revisions need not be feature parity. For us on the
> > common
> > > > use cases. It’s sounds like an advanced use case to have proxy
> regions
> > on
> > > > the server so focus on the common partitioned and replicated first
> for
> > > the
> > > > initial release.
> > > >
> > > > -jake
> > > >
> > > > > On Aug 20, 2019, at 11:07 AM, Alexander Murmann <
> amurm...@apache.org
> > >
> > > > wrote:
> > > > >
> > > > > Hey folks, I want to make sure that any other's product's roadmaps
> > have
> > > > no
> > > > > impact on any decisions we make about Apache Geode.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Tue, Aug 20, 2019 at 10:45 AM Darrel Schneider <
> > > dschnei...@pivotal.io
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Is "group" support on the PCC roadmap or is the plan for the
> members
> > > of
> > > > a
> > > > >> cluster to always be uniform?
> > > > >>
> > > > >>> On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao 
> > > wrote:
> > > > >>>
> > > > >>> So, sound like we still need to support *PROXY types. It's OK to
> > drop
> > > > >>> support for LOCAL* region types in management rest API?
> > > > >>>
> > > > >>> Also, regarding existing region shortcuts, we are also
> > experimenting
> > > > >> using
> > > > >>> different object types to represent different types of region,
> for
> > > > >> example,
> > > > >>> redundantCopies property should only exists in partition regions.
> > > > Instead
> > > > >>> of having a flat object that could have a type of any of these
> > values
> > > > and
> > > > >>> holds all sorts of properties that may/ma

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-20 Thread Darrel Schneider

gfsh create region currently does not support "distributed-no-ack" nor
"global". I did not find in jira a feature request for gfsh to support
these. So I think it would be safe for the Geode Management REST API to
also not support those scopes.


On Tue, Aug 20, 2019 at 12:10 PM Kirk Lund  wrote:

> Here's my 2cents: The Geode Management REST API should definitely support
> "group" such that creation of a region may target zero, one, or more
> groups.
>
> On Tue, Aug 20, 2019 at 10:45 AM Darrel Schneider 
> wrote:
>
> > Is "group" support on the PCC roadmap or is the plan for the members of a
> > cluster to always be uniform?
> >
> > On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao  wrote:
> >
> > > So, sound like we still need to support *PROXY types. It's OK to drop
> > > support for LOCAL* region types in management rest API?
> > >
> > > Also, regarding existing region shortcuts, we are also experimenting
> > using
> > > different object types to represent different types of region, for
> > example,
> > > redundantCopies property should only exists in partition regions.
> Instead
> > > of having a flat object that could have a type of any of these values
> and
> > > holds all sorts of properties that may/may not make sense for that
> type,
> > > should just have a factory method that given these region shortcuts, we
> > > would return a specific region object that's determined by this type?
> > >
> > > On Tue, Aug 20, 2019 at 8:15 AM Jens Deppe  wrote:
> > >
> > > > Currently, when deployed to the cloud (aka PCC) there is no ability
> > for a
> > > > user to group members thus it is also not possible to create regions
> > (via
> > > > gfsh at least) that are separated by groups. Typically one would
> > create a
> > > > PROXY region against one group and the PARTITION region against
> another
> > > > group. However, without the ability to assign groups, that is not
> > > possible.
> > > >
> > > > --Jens
> > > >
> > > > On Tue, Aug 20, 2019 at 7:46 AM Michael Stolz 
> > wrote:
> > > >
> > > > > I know that lots of folks use PROXY regions on the server side to
> > host
> > > > > logic associated with the region, but I think they always do that
> in
> > > > > conjunction with server groups so that the proxy is on some of the
> > > server
> > > > > and the same region containing data is on others. Given the way
> > > cache.xml
> > > > > works they might not even bother with the server groups, but I'm
> not
> > > > sure.
> > > > >
> > > > > I think we should carry forward the existing shortcuts and not go
> > > > backward
> > > > > to the separate attributes.
> > > > >
> > > > > --
> > > > > Mike Stolz
> > > > > Principal Engineer, Pivotal Cloud Cache
> > > > > Mobile: +1-631-835-4771
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Aug 19, 2019 at 7:59 PM Darrel Schneider <
> > > dschnei...@pivotal.io>
> > > > > wrote:
> > > > >
> > > > > > Keep in mind that the context of the regions in question is the
> > > > cluster.
> > > > > So
> > > > > > these regions would be created on servers.
> > > > > > So, for example, does anyone see a need to create PROXY regions
> on
> > > the
> > > > > > server? Even if we did not support them on the server, they would
> > > still
> > > > > be
> > > > > > supported on clients.
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 19, 2019 at 4:26 PM Jinmei Liao 
> > > wrote:
> > > > > >
> > > > > > > Region type (in another word Region shortcut) defines a set of
> > > > > attributes
> > > > > > > for a region. These are the list of region types we have:
> > > > > > >
> > > > > > > LOCAL,
> > > > > > > LOCAL_PERSISTENT,
> > > > > > > LOCAL_HEAP_LRU,
> > > > > > > LOCAL_OVERFLOW,
> > > > > > > LOCAL_PERSISTENT_OVERFLOW,
> > > > > > >
> > > > > > > PARTITION,
> > > > > > > P

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-20 Thread Darrel Schneider

By far the most common region "scope" is distributed-ack. We think LOCAL
scope is hardly ever used.
Partitioned regions ignore the scope and automatically use
"distributed-ack".

Should the default scope of "distributed-no-ack" be supported by the rest
management api for replicates?
Should the scope of "global" be supported by the rest management api for
replicates?

On Tue, Aug 20, 2019 at 11:38 AM Darrel Schneider 
wrote:

> The shortcuts support partitioned regions with 0 and 1 redundant copies.
> Is redundancies greater than 1 common enough for the rest management api to
> support it?
>
> On Tue, Aug 20, 2019 at 11:27 AM Jacob Barrett 
> wrote:
>
>> +1 to Alexander’s statement.
>>
>> Also, initial revisions need not be feature parity. For us on the common
>> use cases. It’s sounds like an advanced use case to have proxy regions on
>> the server so focus on the common partitioned and replicated first for the
>> initial release.
>>
>> -jake
>>
>> > On Aug 20, 2019, at 11:07 AM, Alexander Murmann 
>> wrote:
>> >
>> > Hey folks, I want to make sure that any other's product's roadmaps have
>> no
>> > impact on any decisions we make about Apache Geode.
>> >
>> > Thanks!
>> >
>> > On Tue, Aug 20, 2019 at 10:45 AM Darrel Schneider <
>> dschnei...@pivotal.io>
>> > wrote:
>> >
>> >> Is "group" support on the PCC roadmap or is the plan for the members
>> of a
>> >> cluster to always be uniform?
>> >>
>> >>> On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao 
>> wrote:
>> >>>
>> >>> So, sound like we still need to support *PROXY types. It's OK to drop
>> >>> support for LOCAL* region types in management rest API?
>> >>>
>> >>> Also, regarding existing region shortcuts, we are also experimenting
>> >> using
>> >>> different object types to represent different types of region, for
>> >> example,
>> >>> redundantCopies property should only exists in partition regions.
>> Instead
>> >>> of having a flat object that could have a type of any of these values
>> and
>> >>> holds all sorts of properties that may/may not make sense for that
>> type,
>> >>> should just have a factory method that given these region shortcuts,
>> we
>> >>> would return a specific region object that's determined by this type?
>> >>>
>> >>>> On Tue, Aug 20, 2019 at 8:15 AM Jens Deppe 
>> wrote:
>> >>>>
>> >>>> Currently, when deployed to the cloud (aka PCC) there is no ability
>> >> for a
>> >>>> user to group members thus it is also not possible to create regions
>> >> (via
>> >>>> gfsh at least) that are separated by groups. Typically one would
>> >> create a
>> >>>> PROXY region against one group and the PARTITION region against
>> another
>> >>>> group. However, without the ability to assign groups, that is not
>> >>> possible.
>> >>>>
>> >>>> --Jens
>> >>>>
>> >>>> On Tue, Aug 20, 2019 at 7:46 AM Michael Stolz 
>> >> wrote:
>> >>>>
>> >>>>> I know that lots of folks use PROXY regions on the server side to
>> >> host
>> >>>>> logic associated with the region, but I think they always do that in
>> >>>>> conjunction with server groups so that the proxy is on some of the
>> >>> server
>> >>>>> and the same region containing data is on others. Given the way
>> >>> cache.xml
>> >>>>> works they might not even bother with the server groups, but I'm not
>> >>>> sure.
>> >>>>>
>> >>>>> I think we should carry forward the existing shortcuts and not go
>> >>>> backward
>> >>>>> to the separate attributes.
>> >>>>>
>> >>>>> --
>> >>>>> Mike Stolz
>> >>>>> Principal Engineer, Pivotal Cloud Cache
>> >>>>> Mobile: +1-631-835-4771
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Aug 19, 2019 at 7:59 PM Darrel Schneider <
>> >>> dschnei...@pivotal.io>
>> >>>>> wrote:
>> >>

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-20 Thread Darrel Schneider

The shortcuts support partitioned regions with 0 and 1 redundant copies. Is
redundancies greater than 1 common enough for the rest management api to
support it?

On Tue, Aug 20, 2019 at 11:27 AM Jacob Barrett  wrote:

> +1 to Alexander’s statement.
>
> Also, initial revisions need not be feature parity. For us on the common
> use cases. It’s sounds like an advanced use case to have proxy regions on
> the server so focus on the common partitioned and replicated first for the
> initial release.
>
> -jake
>
> > On Aug 20, 2019, at 11:07 AM, Alexander Murmann 
> wrote:
> >
> > Hey folks, I want to make sure that any other's product's roadmaps have
> no
> > impact on any decisions we make about Apache Geode.
> >
> > Thanks!
> >
> > On Tue, Aug 20, 2019 at 10:45 AM Darrel Schneider  >
> > wrote:
> >
> >> Is "group" support on the PCC roadmap or is the plan for the members of
> a
> >> cluster to always be uniform?
> >>
> >>> On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao  wrote:
> >>>
> >>> So, sound like we still need to support *PROXY types. It's OK to drop
> >>> support for LOCAL* region types in management rest API?
> >>>
> >>> Also, regarding existing region shortcuts, we are also experimenting
> >> using
> >>> different object types to represent different types of region, for
> >> example,
> >>> redundantCopies property should only exists in partition regions.
> Instead
> >>> of having a flat object that could have a type of any of these values
> and
> >>> holds all sorts of properties that may/may not make sense for that
> type,
> >>> should just have a factory method that given these region shortcuts, we
> >>> would return a specific region object that's determined by this type?
> >>>
> >>>> On Tue, Aug 20, 2019 at 8:15 AM Jens Deppe  wrote:
> >>>>
> >>>> Currently, when deployed to the cloud (aka PCC) there is no ability
> >> for a
> >>>> user to group members thus it is also not possible to create regions
> >> (via
> >>>> gfsh at least) that are separated by groups. Typically one would
> >> create a
> >>>> PROXY region against one group and the PARTITION region against
> another
> >>>> group. However, without the ability to assign groups, that is not
> >>> possible.
> >>>>
> >>>> --Jens
> >>>>
> >>>> On Tue, Aug 20, 2019 at 7:46 AM Michael Stolz 
> >> wrote:
> >>>>
> >>>>> I know that lots of folks use PROXY regions on the server side to
> >> host
> >>>>> logic associated with the region, but I think they always do that in
> >>>>> conjunction with server groups so that the proxy is on some of the
> >>> server
> >>>>> and the same region containing data is on others. Given the way
> >>> cache.xml
> >>>>> works they might not even bother with the server groups, but I'm not
> >>>> sure.
> >>>>>
> >>>>> I think we should carry forward the existing shortcuts and not go
> >>>> backward
> >>>>> to the separate attributes.
> >>>>>
> >>>>> --
> >>>>> Mike Stolz
> >>>>> Principal Engineer, Pivotal Cloud Cache
> >>>>> Mobile: +1-631-835-4771
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Aug 19, 2019 at 7:59 PM Darrel Schneider <
> >>> dschnei...@pivotal.io>
> >>>>> wrote:
> >>>>>
> >>>>>> Keep in mind that the context of the regions in question is the
> >>>> cluster.
> >>>>> So
> >>>>>> these regions would be created on servers.
> >>>>>> So, for example, does anyone see a need to create PROXY regions on
> >>> the
> >>>>>> server? Even if we did not support them on the server, they would
> >>> still
> >>>>> be
> >>>>>> supported on clients.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Aug 19, 2019 at 4:26 PM Jinmei Liao 
> >>> wrote:
> >>>>>>
> >>>>>>> Region type (in another word Region shortcut) defines a set of
> >>>>> attributes
> >>>>>>

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-20 Thread Darrel Schneider

Is "group" support on the PCC roadmap or is the plan for the members of a
cluster to always be uniform?

On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao  wrote:

> So, sound like we still need to support *PROXY types. It's OK to drop
> support for LOCAL* region types in management rest API?
>
> Also, regarding existing region shortcuts, we are also experimenting using
> different object types to represent different types of region, for example,
> redundantCopies property should only exists in partition regions. Instead
> of having a flat object that could have a type of any of these values and
> holds all sorts of properties that may/may not make sense for that type,
> should just have a factory method that given these region shortcuts, we
> would return a specific region object that's determined by this type?
>
> On Tue, Aug 20, 2019 at 8:15 AM Jens Deppe  wrote:
>
> > Currently, when deployed to the cloud (aka PCC) there is no ability for a
> > user to group members thus it is also not possible to create regions (via
> > gfsh at least) that are separated by groups. Typically one would create a
> > PROXY region against one group and the PARTITION region against another
> > group. However, without the ability to assign groups, that is not
> possible.
> >
> > --Jens
> >
> > On Tue, Aug 20, 2019 at 7:46 AM Michael Stolz  wrote:
> >
> > > I know that lots of folks use PROXY regions on the server side to host
> > > logic associated with the region, but I think they always do that in
> > > conjunction with server groups so that the proxy is on some of the
> server
> > > and the same region containing data is on others. Given the way
> cache.xml
> > > works they might not even bother with the server groups, but I'm not
> > sure.
> > >
> > > I think we should carry forward the existing shortcuts and not go
> > backward
> > > to the separate attributes.
> > >
> > > --
> > > Mike Stolz
> > > Principal Engineer, Pivotal Cloud Cache
> > > Mobile: +1-631-835-4771
> > >
> > >
> > >
> > > On Mon, Aug 19, 2019 at 7:59 PM Darrel Schneider <
> dschnei...@pivotal.io>
> > > wrote:
> > >
> > > > Keep in mind that the context of the regions in question is the
> > cluster.
> > > So
> > > > these regions would be created on servers.
> > > > So, for example, does anyone see a need to create PROXY regions on
> the
> > > > server? Even if we did not support them on the server, they would
> still
> > > be
> > > > supported on clients.
> > > >
> > > >
> > > > On Mon, Aug 19, 2019 at 4:26 PM Jinmei Liao 
> wrote:
> > > >
> > > > > Region type (in another word Region shortcut) defines a set of
> > > attributes
> > > > > for a region. These are the list of region types we have:
> > > > >
> > > > > LOCAL,
> > > > > LOCAL_PERSISTENT,
> > > > > LOCAL_HEAP_LRU,
> > > > > LOCAL_OVERFLOW,
> > > > > LOCAL_PERSISTENT_OVERFLOW,
> > > > >
> > > > > PARTITION,
> > > > > PARTITION_REDUNDANT,
> > > > > PARTITION_PERSISTENT,
> > > > > PARTITION_REDUNDANT_PERSISTENT,
> > > > > PARTITION_OVERFLOW,
> > > > > PARTITION_REDUNDANT_OVERFLOW,
> > > > > PARTITION_PERSISTENT_OVERFLOW,
> > > > > PARTITION_REDUNDANT_PERSISTENT_OVERFLOW,
> > > > > PARTITION_HEAP_LRU,
> > > > > PARTITION_REDUNDANT_HEAP_LRU,
> > > > >
> > > > > REPLICATE,
> > > > > REPLICATE_PERSISTENT,
> > > > > REPLICATE_OVERFLOW,
> > > > > REPLICATE_PERSISTENT_OVERFLOW,
> > > > > REPLICATE_HEAP_LRU,
> > > > >
> > > > > REPLICATE_PROXY,
> > > > > PARTITION_PROXY,
> > > > > PARTITION_PROXY_REDUNDANT,
> > > > >
> > > > > In region management rest api, especially in PCC world, we are
> > > wondering
> > > > > 1) should we allow users to create LOCAL* regions through
> management
> > > rest
> > > > > api?
> > > > > 2) should we allow users to create *PROXY regions through
> management
> > > rest
> > > > > api?
> > > > > 3) for the rest of the PARTITION* and REPLICATE* types, should we
> > > strive
> > > > to
> > > > > keep the region type list the same as before, or only keep the type
> > as
> > > > > REPLICATE/PARTITION, but use other properties like "redundantCopy"
> > and
> > > > > "evictionAction" to allow different permutations of region
> > attributes?
> > > > >
> > > > > comments appreciated!
> > > > > --
> > > > > Cheers
> > > > >
> > > > > Jinmei
> > > > >
> > > >
> > >
> >
>
>
> --
> Cheers
>
> Jinmei
>

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-19 Thread Darrel Schneider

Keep in mind that the context of the regions in question is the cluster. So
these regions would be created on servers.
So, for example, does anyone see a need to create PROXY regions on the
server? Even if we did not support them on the server, they would still be
supported on clients.


On Mon, Aug 19, 2019 at 4:26 PM Jinmei Liao  wrote:

> Region type (in another word Region shortcut) defines a set of attributes
> for a region. These are the list of region types we have:
>
> LOCAL,
> LOCAL_PERSISTENT,
> LOCAL_HEAP_LRU,
> LOCAL_OVERFLOW,
> LOCAL_PERSISTENT_OVERFLOW,
>
> PARTITION,
> PARTITION_REDUNDANT,
> PARTITION_PERSISTENT,
> PARTITION_REDUNDANT_PERSISTENT,
> PARTITION_OVERFLOW,
> PARTITION_REDUNDANT_OVERFLOW,
> PARTITION_PERSISTENT_OVERFLOW,
> PARTITION_REDUNDANT_PERSISTENT_OVERFLOW,
> PARTITION_HEAP_LRU,
> PARTITION_REDUNDANT_HEAP_LRU,
>
> REPLICATE,
> REPLICATE_PERSISTENT,
> REPLICATE_OVERFLOW,
> REPLICATE_PERSISTENT_OVERFLOW,
> REPLICATE_HEAP_LRU,
>
> REPLICATE_PROXY,
> PARTITION_PROXY,
> PARTITION_PROXY_REDUNDANT,
>
> In region management rest api, especially in PCC world, we are wondering
> 1) should we allow users to create LOCAL* regions through management rest
> api?
> 2) should we allow users to create *PROXY regions through management rest
> api?
> 3) for the rest of the PARTITION* and REPLICATE* types, should we strive to
> keep the region type list the same as before, or only keep the type as
> REPLICATE/PARTITION, but use other properties like "redundantCopy" and
> "evictionAction" to allow different permutations of region attributes?
>
> comments appreciated!
> --
> Cheers
>
> Jinmei
>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Darrel Schneider

Originally we just had a single instance of CachePerfStats per jvm. So all
the region related stats were always rolled up onto the single
CachePerfStats. Later we added RegionPerfStats so that users could see what
was happening on a per region basis. When RegionPerfStats was added it was
made to extend CachePerfStats but no new stats were added to it.

I think we now have some stats that only make sense on a Cache (like
"regions" and "partitionedRegions") but we also have them on
RegionPerfStats. I thought these used to always return zero on the region
but I just looked at the implementation and it looks like they just return
1 or 0 on the region (depending on if it is partitioned or not). The help
text says it will be the number of regions on the cache (so the help is
correct for CachePerfStats but wrong for RegionPerfStats). I would be okay
with us dropping the stats that only make sense at the cache level from the
per region stats.

Something I could not tell from you diagram is if you are cleaning this up.
Does CacheStats also have everything that is on RegionStats? Or will it now
just have the stats that are unique to a cache?

If you change the internal names (like CachePerfStats -> CacheStats) keep
in mind that you should use the same type name when calling "createType"
(in this case "CachePerfStats") for backwards compatibility.

On Thu, Jul 11, 2019 at 9:45 AM Mark Hanson  wrote:

> See my comments inline..
>
> > On Jul 11, 2019, at 9:36 AM, Darrel Schneider 
> wrote:
> >
> > Currently we do not make visible per bucket stats. Is the above proposal
> > just to change the internal implementation but the stats visible in tools
> > like vsd are unchanged?
> >
> As with my previous e-mail exchange with Jake, I would like to back burner
> per bucket stats. If it turns out to be a problem, I will bring it back
> before the group.
>
> My goal is these are internal changes. I would see it as a problem
> requiring re-evaluation, if this were a meaningful breaking change. E.g.
> breaks vsd, changes public API
> An important note would be that fixing a bug in a stat, is not a
> meaningful breaking change.
>
> > Currently we have an internal CachePerfStats which the internal
> > RegionPerfStats extends. Does your CacheStats replace CachePerfStats and
> > your RegionStats replace RegionPerfStats?
> >
>
> You are 100% correct.
>
> > Currently we have an internal PartitionedRegionStats class. Does
> > your PartitionedRegionStats replace it?
> >
>
> Yes. I considered that under the “RegionPerfStats” umbrella.
>
> > Are your "*Stats" interfaces and your "*StatsImpl" classes?
>
> Yes.
>
> >
> > On Thu, Jul 11, 2019 at 9:29 AM Mark Hanson  wrote:
> >
> >> It depends to be honest. This is just a plan. I would hope to roll them
> >> up, but I can see a case where you have buckets on different servers,
> that
> >> would seem to necessitate that.
> >>
> >>> On Jul 11, 2019, at 9:26 AM, Jacob Barrett 
> wrote:
> >>>
> >>> There isn’t currently a partition stat instance per bucket. Are you
> >> saying you’re making that a thing now?
> >>>
> >>>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
> >>>>
> >>>> Correct.
> >>>>
> >>>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider  >
> >> wrote:
> >>>>>
> >>>>> Why would a PartitionedRegionStatsImpl contain more than one
> >> RegionStats?
> >>>>> Are these representing the local buckets?
> >>>>>
> >>>>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson 
> >> wrote:
> >>>>>>
> >>>>>> PartitionRegionStatsImpl can contain one to many RegionStats
> >>>>>>
> >>>>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
> >>>>>>>
> >>>>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
> >>>>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
> >>>>>> RegionStats?
> >>>>>>>
> >>>>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson  >>  >>>>>> mhan...@pivotal.io>> wrote:
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> As many of you may know our structure for our perf stats is not
> >> great. I
> >>>>>> would like to propose we refactor the code to have the following
> >>>>>> inheritance model, which Kirk and I came up with.
> >>>>>>>
> >>>>>>> It is my belief that fixing this will allow future features to be
> >>>>>> implemented in a much less painful way.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Mark
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >>
>
>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Darrel Schneider

Currently we do not make visible per bucket stats. Is the above proposal
just to change the internal implementation but the stats visible in tools
like vsd are unchanged?

Currently we have an internal CachePerfStats which the internal
RegionPerfStats extends. Does your CacheStats replace CachePerfStats and
your RegionStats replace RegionPerfStats?

Currently we have an internal PartitionedRegionStats class. Does
your PartitionedRegionStats replace it?

Are your "*Stats" interfaces and your "*StatsImpl" classes?

On Thu, Jul 11, 2019 at 9:29 AM Mark Hanson  wrote:

> It depends to be honest. This is just a plan. I would hope to roll them
> up, but I can see a case where you have buckets on different servers, that
> would seem to necessitate that.
>
> > On Jul 11, 2019, at 9:26 AM, Jacob Barrett  wrote:
> >
> > There isn’t currently a partition stat instance per bucket. Are you
> saying you’re making that a thing now?
> >
> >> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
> >>
> >> Correct.
> >>
> >>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider 
> wrote:
> >>>
> >>> Why would a PartitionedRegionStatsImpl contain more than one
> RegionStats?
> >>> Are these representing the local buckets?
> >>>
> >>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson 
> wrote:
> >>>>
> >>>> PartitionRegionStatsImpl can contain one to many RegionStats
> >>>>
> >>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
> >>>>>
> >>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
> >>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
> >>>> RegionStats?
> >>>>>
> >>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson   >>>> mhan...@pivotal.io>> wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> As many of you may know our structure for our perf stats is not
> great. I
> >>>> would like to propose we refactor the code to have the following
> >>>> inheritance model, which Kirk and I came up with.
> >>>>>
> >>>>> It is my belief that fixing this will allow future features to be
> >>>> implemented in a much less painful way.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>
> >>>>
> >>
>
>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Darrel Schneider

Why would a PartitionedRegionStatsImpl contain more than one RegionStats?
Are these representing the local buckets?

On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson  wrote:

> PartitionRegionStatsImpl can contain one to many RegionStats
>
> > On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
> >
> > Seems reasonable. I'm guessing that CachePerfImpl contains many
> RegionStats. Does PartitionRegionStatsImpl just contain a single
> RegionStats?
> >
> > On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson  mhan...@pivotal.io>> wrote:
> > Hi All,
> >
> > As many of you may know our structure for our perf stats is not great. I
> would like to propose we refactor the code to have the following
> inheritance model, which Kirk and I came up with.
> >
> > It is my belief that fixing this will allow future features to be
> implemented in a much less painful way.
> >
> > Thoughts?
> >
> > Thanks,
> > Mark
> >
>
>

Re: Unnecessary uses of final on local variables

2019-06-19 Thread Darrel Schneider

I find Bill's argument of using final by default on everything convincing
and removing it when you have something that needs to be changed.
It is not in keeping with my current practice but I'm willing to change.
So I vote -1 to not using "final" on local variables.

On Wed, Jun 19, 2019 at 7:29 AM Anthony Baker  wrote:

> Just to confirm, the primary place where we make project decisions is on
> the dev@geode list.  Thanks!
>
> Anthony
>
>
> > On Jun 19, 2019, at 7:19 AM, Bill Burcham  wrote:
> >
> > I feel that a lot more
> > conversation is needed, outside email. On the other hand, this mailing
> list
> > is a fine place to deliberate. But to deliberate successfully on this
> > issue, via email, I think we'll have to break it down further, and we'll
> > have to be comfortable with the process taking a long time.
>
>

[DISCUSS] changing geode 32-bit counter stats to 64-bit

2019-06-07 Thread Darrel Schneider

We have had a couple of tickets that have problems with 32-bit counters
changing too fast and causing them to be hard to understand when they wrap
around (see GEODE-6425 and GEODE-6834). We have also had problems with some
stats being broken because they were changing the 32-bit one when they
should have been changing the 64-bit one (see GEODE-6776).
The current implementation has one array of values for the 32-bit stats and
another array of values for the 64-bit stats. We use indexes into these
arrays when changing a stat. Given an int "id" used to index these arrays,
we can not tell if we should be indexing the 32-bit array or 64-bit array.
The first 32-bit stat for a type will have an id of 0 and the first 64-bit
stat on that type will also have an id of 0. But our current implementation
has the same type of value in both arrays (LongAdder
see: StripedStatisticsImpl fields intAdders and longAdders). So if we
simply change our implementation to have a single array, then the ids will
no longer be overloaded.

Changing this internal implementation also improves backward compatibility.
Currently when we change one of our counters from 32-bit to 64-bit it is
possible we would break existing applications that are using the Statistics
interface. It has methods on it that allow stats to be read given an it:
getInt(int id) and getLong(int id). If they are currently reading a 32-bit
stat using getInt(id) and we change that stat to be 64-bit (like we did in
GEODE-6425) then they will now be reading the wrong stat when they call
getInt(id). If they used getInt(String) or getInt(StatisticDescriptor) they
will be okay. But if we make this simple change to have 32-bit and 64-bit
stats stored in the same array then getInt(id) will do the right thing when
we change a stat from 32-bit to 64-bit.

Does anyone see a problem with making this change?

After we do it I think we should change all of our counters to 64-bit since
they are always stored in a LongAdder anyway and we should deprecate the
methods that support 32-bit stats.

what is the best way to update a geode pull request

2019-05-31 Thread Darrel Schneider

Something I have noticed is that often when I have requested changes be
made to a pull request is that the changes are force pushed ask a single
commit to the pr. I would actually prefer that the changes show up as a new
commit on the pr instead of everything being rebased into one commit. That
makes the history of the pr easier to follow and make it easy to see what
has changed since the previous review. What do others think? Have we done
something that makes contributors think the pull request has to be single
commit? I know the initial pull request is supposed to be but from then on
I'd prefer that we wait to squash when we merge it to develop.

Re: Changing external methods to no longer throw UnsupportedOperationException

2019-05-23 Thread Darrel Schneider

This particular method, Region.getStatistics, when called on a client does
not send a message to the server.
So it returns a stats object describing the region on the client.
So this would not change the behaviour on a client running an older version
of geode.


On Thu, May 23, 2019 at 1:00 PM Owen Nichols  wrote:

> +1
>
> I see no semantic difference between adding a new method vs implementing a
> stub that previously threw UnsupportedOperationException
>
> > On May 23, 2019, at 12:56 PM, Udo Kohlmeyer  wrote:
> >
> > +1 to implementing this method.
> >
> > There is no plausible reason NOT to implement this.
> >
> > --Udo
> >
> > On 5/23/19 12:44, Dan Smith wrote:
> >> +1 to implementing this method in a minor release.
> >>
> >> I'm with Jake on this one. Every bug we fix changes the behavior of the
> >> product in some small way. This seems like a behavior change for the
> better
> >> - I can't picture a use case where someone is actually *relying* on this
> >> method throwing UnsupportedOperationException.
> >>
> >> I suppose someone might write a test that this feature is not supported
> -
> >> but it seems like the only reason to do that would be to detect when
> geode
> >> starts supporting getStatistics, so they'd probably be happy to see
> >> getStatistics start working!
> >>
> >>
> >> -Dan
> >>
> >> On Thu, May 23, 2019 at 10:31 AM Anilkumar Gingade  >
> >> wrote:
> >>
> >>> I agree, this may not look like the usecase that one would be using or
> >>> depending. Going with the backward compatibility requirement this will
> be
> >>> breaking that contract.
> >>> Again, based on the scenario and usecases, there could be exceptions.
> I am
> >>> trying to see if the versioning support that's used to keep the
> backward
> >>> compatibility contract can be used here.
> >>>
> >>> On Thu, May 23, 2019 at 10:17 AM Jacob Barrett 
> >>> wrote:
> >>>
> >>>> But what application is going to legitimately call this method and
> expect
> >>>> that it throw an exception? What would be the function of that usage?
> >>>>
> >>>> If you assume that calling this method under these conditions had no
> >>> value
> >>>> and would therefor never have been called then one could argue that
> >>>> implementing this method is adding a feature. It adds a case where one
> >>>> could legitimately call this method under new conditions.
> >>>>
> >>>>> On May 23, 2019, at 10:06 AM, Anilkumar Gingade  >
> >>>> wrote:
> >>>>> As this changes the behavior on the existing older application; it
> >>> seems
> >>>> to
> >>>>> break the backward compatibility requirements.
> >>>>> We use client versions to keep the contracts/behavior same for older
> >>>>> client; can we do the same here.
> >>>>>
> >>>>> -Anil.
> >>>>>
> >>>>>
> >>>>> On Thu, May 23, 2019 at 8:33 AM Darrel Schneider <
> >>> dschnei...@pivotal.io>
> >>>>> wrote:
> >>>>>
> >>>>>> Is it okay, in a minor release, to implement Region.getStatistics
> for
> >>>>>> partitioned regions? See GEODE-2685. The current behavior is for it
> to
> >>>>>> always throw UnsupportedOperationException. I doubt that any
> >>>> application is
> >>>>>> depending on that behavior but it could be. I know we have seen
> >>> changes
> >>>>>> like this in the past break the tests of other products that are
> >>>> layered on
> >>>>>> top of Geode.
> >>>>>> Should this type of change be considered one that breaks backwards
> >>>>>> compatibility?
> >>>>>>
> >>>>
>
>

Changing external methods to no longer throw UnsupportedOperationException

2019-05-23 Thread Darrel Schneider

Is it okay, in a minor release, to implement Region.getStatistics for
partitioned regions? See GEODE-2685. The current behavior is for it to
always throw UnsupportedOperationException. I doubt that any application is
depending on that behavior but it could be. I know we have seen changes
like this in the past break the tests of other products that are layered on
top of Geode.
Should this type of change be considered one that breaks backwards
compatibility?

Re: [DISCUSS] reduce PR checks to JDK11 only

2019-05-20 Thread Darrel Schneider

I think it would be a mistake to have just the main pipeline catch
analyze-serializable failures. It is easy to accidentally cause these
failures so I think we want them caught early. They are also not flaky so
if broken then they will consistently fail. I think this would cause
someone extra work in tracking down the change, backing it out, and
resubmitting the PR once fixed.
I would vote for getting these checks back into PR checks as quickly as
possible. That is probably "revert PR 3598". Then someone who knows these
tests well can propose the best way to improve them. But after reverting
you could also resubmit it with something like #2 or #3 that runs the
existing tests that require JDK 8 but not all the other tests.

On Sun, May 19, 2019 at 2:20 AM Owen Nichols  wrote:

> Correct, analyze-serializables test is currently skipped under JDK11.
> Ideally it would be re-written to test what is actually serialized, not the
> flawed assumption that no change in compiled bytecode size means nothing
> changed...
>
> This also brings up a good question worth discussing: is the goal of PR
> checks to catch all of the same issues as the main pipeline, or is it ever
> ok to run some tests only in the main pipeline (for example, PR checks have
> never included Windows tests).
>
> So which is the preferred course of action?
> 1) restore ALL JDK8 tests to PR checks (i.e. revert PR 3598)
> 2) restore just the one JDK8 test job that includes analyze-serializables
> to the PR checks
> 3) create a tailored subset of JDK8-specific tests to include in PR checks
> 4) just let the main pipeline catch analyze-serializables failures?
> 5) bite the bullet and fix analyze-serializables to test serialization in
> a more meaningful way that works on both JDK8 and 11
>
> -Owen
>
> > On May 17, 2019, at 1:21 PM, Darrel Schneider 
> wrote:
> >
> > One problem with doing this, I think, is that we have some tests that are
> > disabled unless run on 8. They are the analyze serializables tests. I'm
> not
> > sure of the details but I think the "golden" checksums these tests
> compare
> > to were generated from the byte codes from the jdk 8 class files. The
> byte
> > codes are different for jdk 11. So by pull requests runs only happening
> on
> > jdk 11 we will lose coverage. These tests catch if changed the
> serializable
> > format of classes. I think if the "golden" checksums were regenerated
> with
> > jdk 11 then these tests could be enabled when run on jdk 11. Others on
> the
> > dev list may have more of the details.
> >
> > On Thu, May 16, 2019 at 5:31 PM Owen Nichols 
> wrote:
> >
> >> I’ve created a PR for this, please give it a +1:
> >> https://github.com/apache/geode/pull/3598
> >>
> >>> On May 16, 2019, at 11:12 AM, Anilkumar Gingade 
> >> wrote:
> >>>
> >>> Make sense to me...Looking at the probability of breaking specific to,
> >> jdk8
> >>> and jdk11 through a commit.
> >>>
> >>>
> >>> On Wed, May 15, 2019 at 6:09 PM Owen Nichols 
> >> wrote:
> >>>
> >>>> Currently every PR commit triggers both JDK8 and JDK11 versions of
> each
> >>>> test job.  I propose that we can eliminate the JDK8 version of each
> >> check.
> >>>> In the extremely rare case where a code change breaks on Java 8 but
> >> works
> >>>> fine on Java 11, it would still be caught by the main pipeline (just
> as
> >>>> Windows failures are caught only in the main pipeline).
> >>>>
> >>>> The only tangible effect today of running both JDK8 and JDK11 tests in
> >> PR
> >>>> pipeline is twice the chance to encounter possible flaky failures
> >> (usually
> >>>> unrelated to the commit itself).
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSS] reduce PR checks to JDK11 only

2019-05-17 Thread Darrel Schneider

One problem with doing this, I think, is that we have some tests that are
disabled unless run on 8. They are the analyze serializables tests. I'm not
sure of the details but I think the "golden" checksums these tests compare
to were generated from the byte codes from the jdk 8 class files. The byte
codes are different for jdk 11. So by pull requests runs only happening on
jdk 11 we will lose coverage. These tests catch if changed the serializable
format of classes. I think if the "golden" checksums were regenerated with
jdk 11 then these tests could be enabled when run on jdk 11. Others on the
dev list may have more of the details.

On Thu, May 16, 2019 at 5:31 PM Owen Nichols  wrote:

> I’ve created a PR for this, please give it a +1:
> https://github.com/apache/geode/pull/3598
>
> > On May 16, 2019, at 11:12 AM, Anilkumar Gingade 
> wrote:
> >
> > Make sense to me...Looking at the probability of breaking specific to,
> jdk8
> > and jdk11 through a commit.
> >
> >
> > On Wed, May 15, 2019 at 6:09 PM Owen Nichols 
> wrote:
> >
> >> Currently every PR commit triggers both JDK8 and JDK11 versions of each
> >> test job.  I propose that we can eliminate the JDK8 version of each
> check.
> >> In the extremely rare case where a code change breaks on Java 8 but
> works
> >> fine on Java 11, it would still be caught by the main pipeline (just as
> >> Windows failures are caught only in the main pipeline).
> >>
> >> The only tangible effect today of running both JDK8 and JDK11 tests in
> PR
> >> pipeline is twice the chance to encounter possible flaky failures
> (usually
> >> unrelated to the commit itself).
> >>
> >>
> >>
>
>

Re: How to publish client stats on server

2019-04-16 Thread Darrel Schneider

setPoolStatisticInterval does cause the client to send some of its stats
periodically to the server. But the server puts this information into
MBeans and does not write them to the server's statistic archive. This is
why the javadoc's on setPoolStatisticInterval refers to "gfmon".
Your best bet, as Dan and Anthony pointed out, is to have your client
generate its own gfs file. You can do this be configuring the geode/gemfire
properties as documented. You should find documentation about setting the
following properties: statistic-sampling-enabled, statistic-sample-rate,
statistic-archive-file, enable-time-statistics.
One of the ways you can set these properties is using
ClientCacheFactory.set(String, String)

On Tue, Apr 16, 2019 at 6:29 AM Alberto Bustamante Reyes
 wrote:

> Hi Geode community,
>
> Im trying to run a simple test to check how the client stats are published
> on the server, but I have not been able to do it.
>
> The server is started with the statistic sampling enabled, and in the
> client I set the sending interval with setPoolStatisticInterval, but when I
> open the stats file with VSD, I cannot see any "ClientStat" there. I was
> expecting to see the stats "Client-to-Server Messaging Performance
> (ClientStats)".
>
> I have checked the code and these two actions (setting time interval and
> stats sampling) are the two conditions that enable the publishing of the
> client stats, if Im not wrong.
>
> What am I missing? Thanks in advance.
>
> BR/
>
> Alberto
>

Re: About operating system statistics

2019-04-15 Thread Darrel Schneider

The code you have found predates the open source geode project. That code
requires a native, platform dependent, jar to be distributed. We decided
that geode would be pure java. The code that collects Linux stats is pure
java. So I think the code you have found for Solaris, Windows, and OSX has
been "abandoned" and can be deleted.

On Mon, Apr 15, 2019 at 6:54 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> I have read in the documentation that the operating system statistics are
> only available for Linux systems, but I have found in the code the
> correspondent classes for OSX, Solaris & Windows stats (package
> org.apache.geode.internal.statistics.platform).
>
> Are these classes being used? Should they be included in the documentation
> or deleted? (the OSX ones are almost empty, so Im not sure if someone is
> working on it or its an 'abandonded feature')
>
> Thanks in advance,
>
> BR/
>
> Alberto
>

Re: [Proposal] adding a new type of PdxInstance

2019-01-15 Thread Darrel Schneider

Dan, we still want a "class name" but it no longer has to be an actual name
of a java class.
It can now be just a logical "type name". For two PdxInstances to be equal
they need to have the same class name.
So we really do just want a flag that says: never deserialize this
PdxInstance.
Anil suggested a "setDeserializable(boolean)" and "isDeserializable()" in
place of stable.

On Tue, Jan 15, 2019 at 1:21 PM Dan Smith  wrote:

> If I understand this right, you are talking about a way to create a
> PdxInstance that has no corresponding java class. How about just a
> RegionService.createPdxInstanceFactory() method that doesn't take a
> classname, and therefore has no corresponding java class? It seems a
> PdxInstances without a class is a more fundamental PdxInstance. A
> PdxInstance with a java classname on it is just an extension of the
> classless version.
>
> I agree what Udo is talking about - giving the user better control of
> *when* there value is deserialized to a java object - is also valuable, but
> a separate feature.
>
> -Dan
>
> On Tue, Jan 15, 2019 at 1:09 PM Darrel Schneider 
> wrote:
>
> > Even before the JSON pdx support we had internal support for a
> PdxInstance
> > that deserializes as a PdxInstance.
> > This is just adding an external api for that already existing internal
> > feature. So it is pretty simple to do if we can figure out how to name
> it.
> >
> >
> > On Tue, Jan 15, 2019 at 11:18 AM Galen O'Sullivan  >
> > wrote:
> >
> > > I suspect Udo is remembering something we both had to deal with, which
> is
> > > that the lack of values to get/put PDXInstances on Regions make some
> > > patterns difficult. In internal code, we have to set some thread-locals
> > to
> > > get serialized values out, and in general, I think that setting
> > > pdx-read-serialized is a violation of the contract you'd expect from
> the
> > > type signature of get, put, etc. Having a separate API for serialized
> > > objects, and possibly region-level configuration, makes a lot more
> sense.
> > > You could even have the non-PDX get fail on regions that are set to
> only
> > > use PDX-serialized objects for everything.
> > >
> > > We already have something like a PdxInstance that always deserializes
> to
> > a
> > > PdxInstance -- have a look at the __GEMFIRE_JSON mess that we use for
> > JSON.
> > > However you end up doing the new PDXInstance stuff, I strongly suggest
> > > using the new solution for JSON objects.
> > >
> > > -Galen
> > >
> > > On Tue, Jan 15, 2019 at 10:49 AM Darrel Schneider <
> dschnei...@pivotal.io
> > >
> > > wrote:
> > >
> > > > I like the idea of adding support to the region configuration that
> lets
> > > > users control how it stores the data. But even if we did that, and
> you
> > > are
> > > > correct that it would be much more work, I don't think it would
> address
> > > > this issue or remove the value of a PdxInstance that always
> > deserializes
> > > to
> > > > a PdxInstance. So I'd like this proposal to stay focused on
> PdxInstance
> > > and
> > > > not get side tracked. PdxInstances can be used outside of regions
> (for
> > > > example arguments to functions).
> > > >
> > > > I'd like to see a separate proposal about being able to configure
> how a
> > > > region stores its data. I could be wrong, but I think that proposal
> > would
> > > > focus on the values, not the keys. Storing keys as serialized data is
> > > > tricky because you need to come up with a equals and hashCode and if
> > > those
> > > > are going to be done based on a sequence of serialized bytes then you
> > > > really need to understand your serialization code and make sure that
> > > > "equal" objects always have the same serialized form.
> > > >
> > > >
> > > > On Tue, Jan 15, 2019 at 10:38 AM Udo Kohlmeyer 
> wrote:
> > > >
> > > > > Darrel, thank you for this.
> > > > >
> > > > > I would like to propose a counter-proposal.
> > > > >
> > > > > Instead of introducing another PDXInstance type, why don't we
> improve
> > > > > the serialization framework itself? I know my proposal is most
> likely
> > > > > going to take a little more effort than adding a new type, but I
> > > believe
>

Re: [Proposal] adding a new type of PdxInstance

2019-01-15 Thread Darrel Schneider

Even before the JSON pdx support we had internal support for a PdxInstance
that deserializes as a PdxInstance.
This is just adding an external api for that already existing internal
feature. So it is pretty simple to do if we can figure out how to name it.


On Tue, Jan 15, 2019 at 11:18 AM Galen O'Sullivan 
wrote:

> I suspect Udo is remembering something we both had to deal with, which is
> that the lack of values to get/put PDXInstances on Regions make some
> patterns difficult. In internal code, we have to set some thread-locals to
> get serialized values out, and in general, I think that setting
> pdx-read-serialized is a violation of the contract you'd expect from the
> type signature of get, put, etc. Having a separate API for serialized
> objects, and possibly region-level configuration, makes a lot more sense.
> You could even have the non-PDX get fail on regions that are set to only
> use PDX-serialized objects for everything.
>
> We already have something like a PdxInstance that always deserializes to a
> PdxInstance -- have a look at the __GEMFIRE_JSON mess that we use for JSON.
> However you end up doing the new PDXInstance stuff, I strongly suggest
> using the new solution for JSON objects.
>
> -Galen
>
> On Tue, Jan 15, 2019 at 10:49 AM Darrel Schneider 
> wrote:
>
> > I like the idea of adding support to the region configuration that lets
> > users control how it stores the data. But even if we did that, and you
> are
> > correct that it would be much more work, I don't think it would address
> > this issue or remove the value of a PdxInstance that always deserializes
> to
> > a PdxInstance. So I'd like this proposal to stay focused on PdxInstance
> and
> > not get side tracked. PdxInstances can be used outside of regions (for
> > example arguments to functions).
> >
> > I'd like to see a separate proposal about being able to configure how a
> > region stores its data. I could be wrong, but I think that proposal would
> > focus on the values, not the keys. Storing keys as serialized data is
> > tricky because you need to come up with a equals and hashCode and if
> those
> > are going to be done based on a sequence of serialized bytes then you
> > really need to understand your serialization code and make sure that
> > "equal" objects always have the same serialized form.
> >
> >
> > On Tue, Jan 15, 2019 at 10:38 AM Udo Kohlmeyer  wrote:
> >
> > > Darrel, thank you for this.
> > >
> > > I would like to propose a counter-proposal.
> > >
> > > Instead of introducing another PDXInstance type, why don't we improve
> > > the serialization framework itself? I know my proposal is most likely
> > > going to take a little more effort than adding a new type, but I
> believe
> > > it is less of a work around.
> > >
> > > MY proposal is to have the PDX serialization configuration be a little
> > > more explicit. In the sense that a user can define serialization
> details
> > > down to the Region.Key or Region.Value level.
> > >
> > > Why would we possibly have a "one size fits all" approach? Could one
> > > have a setup where serialization configuration is stored on a per
> region
> > > basis. Maybe in some cases we want to deserialize the key and in some
> > > cases we don't want to. In some regions we want to leave the value in
> > > serialized form and in others we don't. The point is, why limit to a
> > > single flag.
> > >
> > > --Udo
> > >
> > > On 1/15/19 10:17, Darrel Schneider wrote:
> > > > As part of GEODE-6272 we realized we need a way to use a PdxInstance
> as
> > > key
> > > > for a Region entry. The problem with the current PdxInstance behavior
> > is
> > > > that in some members the key may be seen as a PdxInstance and in
> others
> > > > seen as an instance of a domain class. This inconsistency can lead to
> > > > problems, in particular with partitioned regions because of the key's
> > > hash
> > > > code being used to determine the bucket. You can read more about this
> > > here:
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__geode.apache.org_docs_guide_17_developing_data-5Fserialization_using-5Fpdx-5Fregion-5Fentry-5Fkeys.html=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=eizM8j4ZzXpU2_4tKNPdsrNNjryTeKuT6UdYhvucPpY=Pba8A2NQprPqyA0LhCvz9iyCjcXgqxkVildpFiJD6b4=blWIWwIbt5SKqKVidtZsC-cB9QK158CdEdOho54mhiM=
> > > >
> > > > What we want is a new type of PdxInstance that will nev

Re: [Proposal] adding a new type of PdxInstance

2019-01-15 Thread Darrel Schneider

I like the idea of adding support to the region configuration that lets
users control how it stores the data. But even if we did that, and you are
correct that it would be much more work, I don't think it would address
this issue or remove the value of a PdxInstance that always deserializes to
a PdxInstance. So I'd like this proposal to stay focused on PdxInstance and
not get side tracked. PdxInstances can be used outside of regions (for
example arguments to functions).

I'd like to see a separate proposal about being able to configure how a
region stores its data. I could be wrong, but I think that proposal would
focus on the values, not the keys. Storing keys as serialized data is
tricky because you need to come up with a equals and hashCode and if those
are going to be done based on a sequence of serialized bytes then you
really need to understand your serialization code and make sure that
"equal" objects always have the same serialized form.


On Tue, Jan 15, 2019 at 10:38 AM Udo Kohlmeyer  wrote:

> Darrel, thank you for this.
>
> I would like to propose a counter-proposal.
>
> Instead of introducing another PDXInstance type, why don't we improve
> the serialization framework itself? I know my proposal is most likely
> going to take a little more effort than adding a new type, but I believe
> it is less of a work around.
>
> MY proposal is to have the PDX serialization configuration be a little
> more explicit. In the sense that a user can define serialization details
> down to the Region.Key or Region.Value level.
>
> Why would we possibly have a "one size fits all" approach? Could one
> have a setup where serialization configuration is stored on a per region
> basis. Maybe in some cases we want to deserialize the key and in some
> cases we don't want to. In some regions we want to leave the value in
> serialized form and in others we don't. The point is, why limit to a
> single flag.
>
> --Udo
>
> On 1/15/19 10:17, Darrel Schneider wrote:
> > As part of GEODE-6272 we realized we need a way to use a PdxInstance as
> key
> > for a Region entry. The problem with the current PdxInstance behavior is
> > that in some members the key may be seen as a PdxInstance and in others
> > seen as an instance of a domain class. This inconsistency can lead to
> > problems, in particular with partitioned regions because of the key's
> hash
> > code being used to determine the bucket. You can read more about this
> here:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__geode.apache.org_docs_guide_17_developing_data-5Fserialization_using-5Fpdx-5Fregion-5Fentry-5Fkeys.html=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=eizM8j4ZzXpU2_4tKNPdsrNNjryTeKuT6UdYhvucPpY=Pba8A2NQprPqyA0LhCvz9iyCjcXgqxkVildpFiJD6b4=blWIWwIbt5SKqKVidtZsC-cB9QK158CdEdOho54mhiM=
> >
> > What we want is a new type of PdxInstance that will never deserialize to
> a
> > domain class. It will always be a PdxInstance. This can safely be used
> as a
> > Region key since PdxInstance implements equals and hashCode. It can also
> be
> > used in other contexts when you just want some structured data with well
> > defined fields but never need to deserialize that data to a domain class.
> >
> > We are trying to figure out what to call this new type of PdxInstance.
> > Currently the pull request for GEODE-6272 has them named as "stable"
> > because they do not change form; they are always a PdxInstance. Another
> > suggestion was not to name them but add a boolean parameter to the method
> > that creates a PdxInstanceFactory named "forcePDXEveryWhere". Internally
> we
> > have some code that has a boolean named "noDomainClass". I'd prefer we
> come
> > up with a name instead of using boolean parameters. In the Java world you
> > label fields that can't change "final" and in the object world you call
> > objects that can't change "immutable". Would either of these be better
> than
> > "stable"? Any other ideas for what we could calls this new type of
> > PdxInstance?
> >
>

[Proposal] adding a new type of PdxInstance

2019-01-15 Thread Darrel Schneider

As part of GEODE-6272 we realized we need a way to use a PdxInstance as key
for a Region entry. The problem with the current PdxInstance behavior is
that in some members the key may be seen as a PdxInstance and in others
seen as an instance of a domain class. This inconsistency can lead to
problems, in particular with partitioned regions because of the key's hash
code being used to determine the bucket. You can read more about this here:
https://geode.apache.org/docs/guide/17/developing/data_serialization/using_pdx_region_entry_keys.html

What we want is a new type of PdxInstance that will never deserialize to a
domain class. It will always be a PdxInstance. This can safely be used as a
Region key since PdxInstance implements equals and hashCode. It can also be
used in other contexts when you just want some structured data with well
defined fields but never need to deserialize that data to a domain class.

We are trying to figure out what to call this new type of PdxInstance.
Currently the pull request for GEODE-6272 has them named as "stable"
because they do not change form; they are always a PdxInstance. Another
suggestion was not to name them but add a boolean parameter to the method
that creates a PdxInstanceFactory named "forcePDXEveryWhere". Internally we
have some code that has a boolean named "noDomainClass". I'd prefer we come
up with a name instead of using boolean parameters. In the Java world you
label fields that can't change "final" and in the object world you call
objects that can't change "immutable". Would either of these be better than
"stable"? Any other ideas for what we could calls this new type of
PdxInstance?

creating Thread, ThreadFactory, or Executor instances in geode

2018-10-05 Thread Darrel Schneider

When a thread runs, if it throws an exception that is not caught, then that
exception will just silently kill your thread. This can make it very hard
to diagnose what is happening. To remedy this situation in geode, a long
time ago it introduced a LoggingThreadGroup class. As long as you created
one of these groups and put your thread in it, then the uncaught exception
would be logged to the geode logger.
One of the problems with LoggingThreadGroup was that it kept a static
collection of all the instances of it. In some places this lead to memory
leak problems. Also geode had many places in its code that forgot to use
this in favor of some other JDK API that created a thread that would not
have a geode LoggingThreadGroup.
Today fixes for GEODE-5780 and GEODE-5783 were done that remove
LoggingThreadGroup.
Now if you want to create a thread use LoggingThread. You can either
directly create an instance of it of subclass it.
If you want a ThreadFactory use LoggingThreadFactory.
If you want an Executor using LoggingExecutors (which even has support for
a work stealing pool which will log).

All of these classes register with any thread they create a singleton
instance of LoggingUncaughtExceptionHandler that will log uncaught
exceptions to a Logger. If you find some other way that you need to create
a thread that can not use LoggingThread, LoggingThreadFactory, or
LoggingExecutors then you should at least register your thread instance
with LoggingUncaughtExceptionHandler by calling the "setOnThread(Thread)"
method on it.

Let me know if you have any questions about how to create threads in geode.

Re: [DISCUSS] test code style (particularly logging)

2018-09-20 Thread Darrel Schneider

For simple single threaded tests System.out would do the job.
For a multi-threaded test I have found the logging framework to be helpful
because of the thread id and the timestamps.


On Thu, Sep 20, 2018 at 1:50 PM Dale Emery  wrote:

> As long as the stdout is available in the test results, I’m more than
> happy to avoid coupling the tests to the product logging code.
>
> > On Sep 20, 2018, at 1:39 PM, Galen O'Sullivan 
> wrote:
> >
> > I was reviewing a PR recently and noticed that we have some test code
> that
> > uses Logger (or LogWriter). If I understand correctly, anything logged to
> > stdout will be included in the test output, and anything logged in a
> DUnit
> > VM will be logged with the appropriate VM number prepended in the output.
> > Because logging is coupled to product code, I propose we log all test
> > output to standard out (using System.out.println or such). I also propose
> > we add this to the Geode style guide.
> >
> > Thoughts/questions/objections?
> >
> > Thanks,
> > Galen
>
>
> —
> Dale Emery
> dem...@pivotal.io
>
>

Re: [geode-dev] Gfsh Start Server option "disable-exit-when-out-of-memory"

2018-08-08 Thread Darrel Schneider

You can go ahead and improve the help text immediately

On Wed, Aug 8, 2018 at 10:00 AM Kenneth Howe  wrote:

> +1 to deprecating the existing option and removing it in 2.0. The default
> then will be to retain the addition of the appropriate -X JVM argument
> unless/until we choose to implement Kirk’s first suggestion.
>
> > On Aug 8, 2018, at 9:53 AM, Kirk Lund  wrote:
> >
> > We might want to:
> >
> > 1) introduce disable-default-jvm-arguments and implement it to ensure
> that
> > GFSH is not adding any -X args when this option is enabled
> >
> > 2) deprecate disable-exit-when-out-of-memory and then remove it in 2.0
> >
> > On Tue, Aug 7, 2018 at 3:59 PM, Helena Bales  wrote:
> >
> >> Hello Geode devs,
> >>
> >> There is an option on the start server Gfsh command that, according to
> the
> >> help message "Prevents the JVM from exiting when an OutOfMemoryError
> >> occurs. When this option is *not* set, a JVM argument gets added,
> depending
> >> on which JVM you are using, to kill the JVM when an OutOfMemoryError
> >> occurs. While setting the option does prevent those arguments from being
> >> set, it does not keep the JVM alive after an OutOfMemoryError occurs.
> >>
> >> Since this option does not do what its help message or name promises, I
> >> would like to propose that we either remove it or rename it to make its
> >> name more closely match its behaviour.
> >>
> >> ~Helena Bales
> >>
>
>

Re: DISCUSS: Refactor test source set for integrationTest and distributedTest

2018-06-27 Thread Darrel Schneider

what about "testUtilities" instead of "commonTest"?

On Tue, Jun 26, 2018 at 3:36 PM, Jacob Barrett  wrote:

> I'd like to suggest that we refactor our current test source set, which
> contains both unit, integration and distributed tests, into distinct source
> sets, test, integrationTest, distributedTest. These source sets would
> replace the use of the primary categories UnitTest, IntegrationTest and
> DistributedTest.
>
> The catalyst for this change is an issue that Gradle's test runner doesn't
> pre-filter categories when launching tests, so if the tests are launched in
> separate JVMs or Docker containers, like :distributeTest task, the cost of
> spinning up those resources is realized only to immediately exit without
> running any test for all test classes in the module. Switching to separate
> source sets for each category would remove the need to filter on category
> and only tests in the corresponding source set would get executed in their
> external JVM or Docker container. An example of this issue is
> geode-junit:distributedTest task, which forks all test classes in separate
> JVMs but never actually runs any tests since there are no DistributedTest
> tagged tests.
>
> The secondary effect is a way too isolate dependencies in each of the
> source sets. Unit tests in the test set would not have dependencies need
> for integration tests or distributed test so that if you accidentally tried
> to import classes from those frameworks you would get a compiler failure.
> Likewise, integration tests would not include distributed test framework
> dependencies. Any shared test classes like mock, dummies, fakes, etc. could
> be shared in a commonTest source set, but would not contain any tests
> itself.
>
> The proposed structure would look like this:
>
> test/ - only contains unit tests.
> integrationTest/ - only contains integration style tests.
> distributedTest/ - only includes DUnit based tests.
> commonTest/ - includes commonly shared classes between each test category.
> Does not contain any classes.
>
> Thoughts?
>
> -Jake
>

Re: Simple serialize/deserialize test using DataSerializers

2018-03-07 Thread Darrel Schneider

BlobHelper ends up calling DataSerializer.writeObject. The nice thing about
BlobHelper is can
call org.apache.geode.internal.util.BlobHelper.serializeToBlob(Object)
and org.apache.geode.internal.util.BlobHelper.deserializeBlob(byte[])
without needing to write any of your own code the create the input and
output streams.


On Wed, Mar 7, 2018 at 11:40 AM, Dan Smith  wrote:

> DataSerializer.writeObject, if you want a public API.
>
> -Dan
>
> On Wed, Mar 7, 2018 at 11:20 AM, Anthony Baker  wrote:
>
> > BlobHelper?
> >
> > > On Mar 7, 2018, at 10:13 AM, Kirk Lund  wrote:
> > >
> > > Does anyone know what Geode API(s) I should use instead of Apache Geode
> > > SerializationUtils to change the following test to use Geode
> > > DataSerializers?
> > >
> > > @Test
> > > public void serializesAndDeserializes() throws Exception {
> > >  PartitionRegionConfig config = new PartitionRegionConfig(prId, path,
> > > partitionAttributes, scope, evictionAttributes, regionIdleTimeout,
> > > regionTimeToLive, entryIdleTimeout, entryTimeToLive, gatewaySenderIds);
> > >  byte[] bytes = SerializationUtils.serialize(config);
> > >  assertThat(bytes).isNotNull().isNotEmpty();
> > >
> > > assertThat(SerializationUtils.deserialize(bytes)).isNotSameAs(config).
> > isInstanceOf(PartitionRegionConfig.class);
> > > }
> >
> >
>

Re: Handling files and user.dir in tests

2018-01-08 Thread Darrel Schneider

Should geode have a single method it uses to create File instances?
That way the code that determines the parent dir could be in one place.


On Mon, Jan 8, 2018 at 11:26 AM, Patrick Rhomberg 
wrote:

> The ClusterStartupRule, LocatorStarterRule, and ServerStarterRule rules all
> have withTempWorkingDir methods that should take care of this, as well.
> These temporary directories should be removed as part of the rules @After
>  invocation.
>
> On Mon, Jan 8, 2018 at 10:23 AM, Jinmei Liao  wrote:
>
> > +1 for always using absolute path in our product code.
> >
> > Also the server/locator launchers should be able to take a working-dir as
> > parameter to store all the stat/logs/config files.
> >
> > On Fri, Jan 5, 2018 at 3:49 PM, Kirk Lund  wrote:
> >
> > > The from should be:
> > >
> > > this.viewFile = new File("locator" + server.getPort() +
> "view.dat");
> > >
> > > On Fri, Jan 5, 2018 at 3:48 PM, Kirk Lund  wrote:
> > >
> > > > Just to help facilitate the discussion, here's a pull request that
> > > changes
> > > > GMSLocator from:
> > > >
> > > > this.viewFile = new File(System.getProperty("user.dir"),
> > "locator" +
> > > > server.getPort() + "view.dat");
> > > >
> > > > ...to:
> > > >
> > > > this.viewFile = new File(System.getProperty("user.dir"),
> > "locator" +
> > > > server.getPort() + "view.dat");
> > > >
> > > > To allow the new test LocatorViewFileTemporaryFolderDUnitTest to
> > > redirect
> > > > the locator view dat file to a JUnit TemporaryFolder.
> > > >
> > > > The only other way I can think of to this is to introduce a new Geode
> > > > property for "current-directory" which a test could specify. That
> > > property
> > > > value would have to be pushed down to every class that creates a new
> > > File.
> > > >
> > > > Pull request: https://github.com/apache/geode/pull/1243
> > > >
> > > > On Fri, Jan 5, 2018 at 3:32 PM, Kirk Lund  wrote:
> > > >
> > > >> Any calls such as:
> > > >>
> > > >> File file = new File("myfile");
> > > >>
> > > >> ...results in creating a file in the current working directory of
> > > >> IntelliJ or Gradle when executing the test.
> > > >>
> > > >> I previously made a change to Gfsh so that tests can pass in a
> parent
> > > >> directory which will be used instead. This allowed me to change all
> of
> > > the
> > > >> Gfsh tests to write to a JUnit TemporaryFolder.
> > > >>
> > > >> This allows us to clean up ALL file artifacts produced from a test
> and
> > > >> also allows us to avoid file-based pollution from one test to
> another.
> > > >>
> > > >> I'd like to propose that we either always pass a parent directory
> > into a
> > > >> component that produces file artifacts or change all of our code
> from
> > > using
> > > >> the constructor File(String pathname) to using the constructor
> > > File(String
> > > >> parent, String child).
> > > >>
> > > >> That 2nd approach would involve changing lines like this:
> > > >>
> > > >> File file = new File("myfile");
> > > >>
> > > >> ...to:
> > > >>
> > > >> File file = new File(System.getProperty("user.dir"), "myfile");
> > > >>
> > > >> Then if you add this line to a test:
> > > >>
> > > >> System.setProperty("user.dir", temporaryFolder.getRoot().getA
> > > >> bsolutePath());
> > > >>
> > > >> ...you're able to redirect all file artifacts to the JUnit
> > > >> TemporaryFolder.
> > > >>
> > > >> If we don't make this change to product code, then I really don't
> > think
> > > >> we should be manipulating "user.dir" in ANY of our tests because the
> > > >> results are rather broken.
> > > >>
> > > >> If we don't like using "user.dir" then we could devise our own Geode
> > > >> system property for "the current working directory." Honestly, I
> don't
> > > care
> > > >> what system property we use or don't use, but I really want to have
> > ALL
> > > >> file artifacts properly cleaned and deleted after every test. And I
> > > really
> > > >> want to prevent file-based test pollution.
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Cheers
> >
> > Jinmei
> >
>

how to get a good commit comment when using gitbox to merge a pull request

2017-12-14 Thread Darrel Schneider

I noticed that the git log message for a pull request I just merged was
lacking what I expected it to pick up from the single revision in the pull
request. It ended up with a message like this:

Merge pull request #1165 from dschneider-pivotal/feature/GEODE-4091



GEODE-4091: add ThreadFactory for evictor



The threads created to do background eviction scanning

will now be named "LRUListWithAsyncSortingThreadNNN".

The will now be daemon threads.

They now have an unhandled exception handler.

Their group is named "LRUListWithAsyncSorting Threads".


I wanted it to look like this:

GEODE-4091: add ThreadFactory for evictor



The threads created to do background eviction scanning

will now be named "LRUListWithAsyncSortingThreadNNN".

The will now be daemon threads.

They now have an unhandled exception handler.

Their group is named "LRUListWithAsyncSorting Threads".

I added the last five lines to it myself because I saw they were missing.
Did my adding those mess this up?
Am I correct that we don't want that first line ("Merge pull request #...")?

Thanks for any help

Re: Default branch for geode should be 'develop'

2017-12-11 Thread Darrel Schneider

+1

On Mon, Dec 11, 2017 at 1:40 PM, Jens Deppe  wrote:

> Currently, github has the default branch for apache/geode set to 'master'.
> (which is why you need to point the relevant branch to 'develop' for every
> new PR).
>
> Do we agree that this should be set to 'develop'?
>
> (I've messaged with some of the Infra folks and nothing appears to have
> changed recently so this is a bit weird).
>
> I'll create a Jira once there is agreement.
>
> --Jens
>

Re: [DISCUSS] FunctionAdapter incompatible serialVersionUID

2017-11-29 Thread Darrel Schneider

+1 to not removing deprecated apis in minor releases.
The semver policy Alexander describes seems reasonable.
In the past of we have had something deprecated for a long time we have
felt free to remove it whenever we want but I think the semver policy is a
better way to decide when we are free to remove deprecated external apis.


On Wed, Nov 29, 2017 at 8:50 AM, Alexander Murmann 
wrote:

> Even though the class is deprecated, you should be able to go from one
> minor version to another without having to worry about anything breaking.
> The point of semver is to provide information about things breaking or not
> without having to read the changelog. If we remove APIs in a minor version
> because they were previously deprecated we break that contract. Semver.org
> 
> recommends
> that the functionality should be marked as deprecated in a minor release
> and then removed as part of a major release.
>
> On Tue, Nov 28, 2017 at 7:03 PM, Jacob Barrett 
> wrote:
>
> > Since the class was deprecated and is technically only there for pre-1.0
> > compatibly then the behavior of this class should be consistent with the
> > pre-1.0 version. This will break 1.0 to 1.3 but anyone coding to a
> post-1.0
> > version should not be using this deprecated class.
> >
> >
> > > On Nov 28, 2017, at 5:22 PM, Jason Huynh  wrote:
> > >
> > > *With your proposoal 1.0 - 1.3 users would have modify their source
> code
> > on
> > > the client and the server forthe function, correct?*
> > >
> > > If they start a new geode server 1.4+ and happened to extend
> > > functionAdapter (it was deprecated in 1.0) then they would have to
> > > recompile their client to not use functionAdapter.
> > >
> > > This should only affect users that extend FunctionAdapter and execute
> > > functions by serializing them to the server from the client.  If they
> > > execute by id it should not run into this problem...
> > >
> > >> On Tue, Nov 28, 2017 at 5:20 PM Jason Huynh 
> wrote:
> > >>
> > >> Dan, yeah, the suggested change in the stack overflow answer does work
> > and
> > >> I was able to put an if with the exact serialVersionUid before posting
> > the
> > >> proposal, but it is pretty hacky and may affect another class that
> > somehow
> > >> generated the same uid.  I can make that change too but I'd prefer not
> > to
> > >> have to maintain it moving forward...
> > >>
> > >>
> > >>
> > >>> On Tue, Nov 28, 2017 at 5:09 PM Dan Smith  wrote:
> > >>>
> > >>> I agree I don't think we can get rid of FunctionAdapter until the
> next
> > >>> major version.
> > >>>
> > >>> I was thinking FunctionAdapter is rather widely used, but then I'm
> > >>> surprised no one has hit this yet.
> > >>>
> > >>> All of the options kinda suck here - either pre 1.0 users have a
> > >>> compatibility issue or 1.0-1.3 users do. With your proposoal 1.0 -
> 1.3
> > >>> users would have modify their source code on the client and the
> server
> > for
> > >>> the function, correct?
> > >>>
> > >>> If we got really fancy we could actually ignore the serialVersionUUID
> > for
> > >>> this class like this - https://stackoverflow.com/a/1816711/2813144.
> > But
> > >>> it's pretty messy.
> > >>>
> > >>> -Dan
> > >>>
> > >>> On Tue, Nov 28, 2017 at 1:59 PM, Alexander Murmann <
> > amurm...@pivotal.io>
> > >>> wrote:
> > >>>
> >  Anil, I am not sure following. I think FunctionAdapter already is
> >  deprecated. Isn't it? Anthony is right though that we shouldn't
> remove
> >  anything customer facing unless we are doing a major release.
> > Otherwise
> > >>> we
> >  are violating the contract provided by semantic versioning.
> > 
> >  On Tue, Nov 28, 2017 at 1:52 PM, Anilkumar Gingade <
> > aging...@pivotal.io
> > 
> >  wrote:
> > 
> > > I haven't seen many uses of FunctionAdapter; if its not used much,
> I
> >  think
> > > we should deprecate this...
> > >
> > > It only provided default implementation for few of the methods;
> this
> >  could
> > > be added in the docs/release notes to help application to move to
> >  function
> > > implementation.
> > >
> > > -Anil.
> > >
> > >
> > > On Tue, Nov 28, 2017 at 12:40 PM, Anthony Baker  >
> >  wrote:
> > >
> > >> I think we should wait for a major release to remove API’s.  If we
> >  broke
> > > a
> > >> public API, we should fix that IMO.
> > >>
> > >> Anthony
> > >>
> > >>
> > >>> On Nov 28, 2017, at 11:40 AM, Patrick Rhomberg <
> > >>> prhomb...@pivotal.io
> > >
> > >> wrote:
> > >>>
> > >>> +1 to removing a long-deprecated class from the Geode side.
> > >>>
> > >>> On Tue, Nov 28, 2017 at 8:04 AM, Bruce Schuchardt <
> > >> bschucha...@pivotal.io>
> > >>> wrote:
> > >>>
> >

Re: Rename DM and DistributionManager

2017-11-09 Thread Darrel Schneider

+1

On Thu, Nov 9, 2017 at 12:18 PM, Kirk Lund  wrote:

> I'd like to do some renaming to use better class names including:
>
> GEODE-3965: Rename DistributionManager to ClusterDistributionManager and DM
> to DistributionManager
>
> The results would be:
>
> * interface = DistributionManager
> * impl for cluster = ClusterDistributionManager
> * impl for loner = LonerDistributionManager
>
> Please let me know if anyone feels strongly against this rename.
>
> Thanks!
> Kirk
>

Re: Internal package name structure

2017-10-09 Thread Darrel Schneider

+1 for #2

On Mon, Oct 9, 2017 at 12:35 PM, Kirk Lund  wrote:

> Sounds good.
>
> Right now the User API for Statistics is in org.apache.geode (where it's
> been since before my time) but we could deprecate it there and move it to
> org.apache.geode.statistics.
>
> On Mon, Oct 9, 2017 at 12:28 PM, Udo Kohlmeyer 
> wrote:
>
> > @Kirk, I believe that internal Geode moduels can be handled with approach
> > #2, e.g "org.apache.geode.logging.internal"...
> >
> > I believe that we should think modules with public interfaces... even if
> > the only consumers of those interfaces is another Geode module. How else
> > would we achieve complete modularity or componentization?
> > Statistics can still have 99% of its implementation in the "internal"
> > package space, but surely we would still have a public interface that
> would
> > be exposed.
> >
> > To me it makes more sense to start thinking of modules and the
> > services/interfaces they expose, rather than is it internal to Geode.
> >
> > --Udo
> >
> >
> > On Mon, Oct 9, 2017 at 11:50 AM, Kirk Lund  wrote:
> >
> > > The real reason we have both is because we have some internal
> components
> > > that don't have any corresponding User API (currently).
> > >
> > > For example, we have org.apache.geode.internal.logging and
> > > org.apache.geode.internal.statistics but neither of these have a
> > > non-internal package.
> > >
> > > Do we want to start creating non-internal packages for things like
> > logging
> > > even if there are no classes in the non-internal package?
> > >
> > > On Mon, Oct 9, 2017 at 10:55 AM, Dan Smith  wrote:
> > >
> > > > +1 for #2
> > > >
> > > > We will need to be careful refactoring existing code if classes are
> > sent
> > > > over the wire.
> > > >
> > > > -Dan
> > > >
> > > > On Mon, Oct 9, 2017 at 10:36 AM, Udo Kohlmeyer 
> wrote:
> > > >
> > > > > Hi there Geode devs,
> > > > >
> > > > > Whilst going through the code base I found that we have 2 differing
> > > > > approaches of how we classify (or package structures) internal
> code.
> > > > >
> > > > > The first is /org.apache.geode.*INTERNAL*.module/ the other is
> > > > > /org.apache.geode.module.*INTERNAL*/.
> > > > >
> > > > > Can anyone explain the difference to me and which one is the
> > preferred
> > > > > mechanism. I vote for approach 2, where the /*internal*/ package is
> > > > defined
> > > > > within the module/functional area.
> > > > >
> > > > > --Udo
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Kindest Regards
> > -
> > *Udo Kohlmeyer* | *Snr Solutions Architect* |*Pivotal*
> > *Mobile:* +61 409-279-160 | ukohlme...@pivotal.io
> > 
> > www.pivotal.io
> >
>

Re: Review Request 62180: refactor away GemfireCacheImpl.getInstance from lucene function

2017-09-07 Thread Darrel Schneider

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62180/#review184922
---

This review is confusing. It requires the fix for GEODE-3557 which currently 
has its own PR (see https://github.com/apache/geode/pull/763).
So why did you submit this review? It looks like you just grabbed one line from 
that PR since it has also has the change.
I just wanted to make clear you should not checkin this code review because in 
your code base DM.getCache() does not exist.

- Darrel Schneider

On Sept. 7, 2017, 5:43 p.m., xiaojian zhou wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62180/
> ---
> 
> (Updated Sept. 7, 2017, 5:43 p.m.)
> 
> 
> Review request for geode, Barry Oglesby and Dan Smith.
> 
> 
> Bugs: GEODE-3557
> https://issues.apache.org/jira/browse/GEODE-3557
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> use dm.getCache() instead
> 
> 
> Diffs
> -
> 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/DestroyLuceneIndexMessage.java
>  9eada5b20 
> 
> 
> Diff: https://reviews.apache.org/r/62180/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> xiaojian zhou
> 
>

Re: Review Request 62164: GEODE-3566 Moving a bucket during rebalancing does not update overflow stats

2017-09-07 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62164/#review184876
---


Ship it!




Ship It!

- Darrel Schneider


On Sept. 7, 2017, 9:56 a.m., Lynn Gallinat wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62164/
> ---
> 
> (Updated Sept. 7, 2017, 9:56 a.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Eric Shu, and 
> Nick Reich.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> We now update the stats in a member that lost a bucket due to rebalancing
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/BucketRegion.java 
> 465a3dd 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/BucketRebalanceStatRegressionTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62164/diff/1/
> 
> 
> Testing
> ---
> 
> Precheckin
> 
> 
> Thanks,
> 
> Lynn Gallinat
> 
>

Re: Review Request 61895: GEDOE-3516: TXManagerImpl.tryResume call may add itself again into the waiting thread queue

2017-09-06 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61895/#review184709
---


Ship it!




Ship It!

- Darrel Schneider


On Sept. 6, 2017, 11:42 a.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61895/
> ---
> 
> (Updated Sept. 6, 2017, 11:42 a.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Lynn Gallinat, 
> and Nick Reich.
> 
> 
> Bugs: GEODE-3516
> https://issues.apache.org/jira/browse/GEODE-3516
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Remove the thread from waiting thread queue after successfully resumed the 
> transaction
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/TXManagerImpl.java 
> a0a4d7c 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/TXManagerImplJUnitTest.java
>  a2c1e70 
> 
> 
> Diff: https://reviews.apache.org/r/61895/diff/3/
> 
> 
> Testing
> ---
> 
> precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>

Re: [Discuss] Building objects with the Factory pattern or the Builder pattern. Adding the fluent model?

2017-09-06 Thread Darrel Schneider

In the geode java apis you would create a CacheFactory (or
ClientCacheFactory), configure it fluently, and then call create at the
end. So even though we call them factories in the geode java apis, they use
the Builder pattern and also support the fluent model in that you could do
this:
  ClientCache cache = new  ClientCacheFactory().set("propName",
"propValue").addPoolLocator("addr", 10678).create();

 Also in java the DistributedSystem is hidden under the Cache. So you add
config to your CacheFactory and when you create it, it will also connect
the DistributedSystem. It used to be that in java you had to connect the
DistributedSystem first and then the Cache. Since the DistributedSystem is
never used apart for a Cache I think this simplification was helpful to
users.

On Wed, Sep 6, 2017 at 10:13 AM, Mark Hanson  wrote:

> Problem:
>
> To fluent and builder model or not to fluent and builder model In the
> native client
>
> we use the factory model of generating objects, we are wondering if a move
> to the fluent model
>
> and the builder pattern would be better.
>
>
> Background:
>
> In designing the various types of allocators for our objects, we have
> considered
>
> whether or not use the builder and fluent patterns instead of the Factory
> pattern.
>
> The current code base is written following the factory pattern, but it
> seems that
>
> the Builder pattern is becoming more popular. Further, people are also
> using the
>
> fluent model as well.
>
> Discussion:
>
> The argument for the Builder pattern is that it allows greater
> specification before the actual
>
> creation of the object. For example, Factory often focuses on the
> attributes
>
> after the creation of the object. The Builder pattern allows one to set the
>
> attributes before the creation of the object, allowing a more specific
> object
>
> to be generated. Currently code is written to the Factory pattern.
>
> Consider the following case of connecting to a cache.
>
> Our current pattern (Factory)
>
> CacheFactoryPtr cacheFactory = CacheFactory::createCacheFactory();
>
> CachePtr cache = cacheFactory->create();
>
> cache->getDistributedSystem().Credentials(“emma”, “pizza”);
>
> cache->getDistributedSystem().connect();
>
> API Used:
>
> bool DistributedSystem::Credentials(String, String)
>
> void DistributedSystem::connect()
>
> Cache CacheFactory::create()
>
> Builder pattern
>
> CacheFactory cf = new CacheFactory();
>
> cf.Locator(“192.168.0.5”, 10334);
>
> cf.Credentials(“emma”, “pizza”);
>
> Cache cache = cf.Connect();
>
> API Used:
>
> bool Locator(String, String)
>
> bool Credentials(String, String)
>
> Cache Connection()
>
>
> Next up we think the real direction lies in further incorporating the
> fluent model
>
>
>
> Fluent model
>
> Cache cache = new CacheFactory()->Locator(“192.168.0.5",
> 10334).Credentials(“emma”, “pizza”).Connect();
>
> API Used:
>
> CacheFactory Locator(String, String)
>
> CacheFactory Credentials(String, String)
>
> Cache Connection()
>
> Do people see value in heading this direction? Sufficient value to rewrite
> the API for Geode Native for example?
>
>
>
> Conclusion
>
> We need input to decide the future direction. What do people think???
>

Re: Review Request 61895: GEDOE-3516: TXManagerImpl.tryResume call may add itself again into the waiting thread queue

2017-08-25 Thread Darrel Schneider

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61895/#review183871
---

geode-core/src/main/java/org/apache/geode/internal/cache/TXManagerImpl.java
Line 1302 (original), 1302 (patched)
<https://reviews.apache.org/r/61895/#comment259934>

Lets not make this package protection for unit test access. Instead keep it 
private and add an accessor for the unit tests. Also try to not expose the 
whole map to unit tests. For example in this case the unit test just needs to 
check that no threads are still waiting for a particular txid.

geode-core/src/main/java/org/apache/geode/internal/cache/TXManagerImpl.java
Lines 1346 (patched)
<https://reviews.apache.org/r/61895/#comment259935>

Instead of fixing this in the finally block we decided it would be best to 
keep this thread from adding itself multiple times to the queue. Also we think 
other race conditions exist in which the while loop that calls tryResume does 
not recheck if the tx still exists. Also it seems like a race exists in which 
suspend unparks a tryResume call that ends up not resuming because of the 
timeout but does not poll the queue again and unpark the next waiter.

- Darrel Schneider

On Aug. 25, 2017, 9:53 a.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61895/
> ---
> 
> (Updated Aug. 25, 2017, 9:53 a.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Lynn Gallinat, 
> and Nick Reich.
> 
> 
> Bugs: GEODE-3516
> https://issues.apache.org/jira/browse/GEODE-3516
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Remove the thread from waiting thread queue after successfully resumed the 
> transaction
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/TXManagerImpl.java 
> a0a4d7c 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/TXManagerImplJUnitTest.java
>  a2c1e70 
> 
> 
> Diff: https://reviews.apache.org/r/61895/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>

Re: Review Request 61895: GEDOE-3516: TXManagerImpl.tryResume call may add itself again into the waiting thread queue

2017-08-24 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61895/#review183799
---



Would it be possible to add a unit test for this fix?

- Darrel Schneider


On Aug. 24, 2017, 1:13 p.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61895/
> ---
> 
> (Updated Aug. 24, 2017, 1:13 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Lynn Gallinat, 
> and Nick Reich.
> 
> 
> Bugs: GEODE-3516
> https://issues.apache.org/jira/browse/GEODE-3516
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Remove the thread from waiting thread queue after successfully resumed the 
> transaction
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/TXManagerImpl.java 
> a0a4d7c 
> 
> 
> Diff: https://reviews.apache.org/r/61895/diff/1/
> 
> 
> Testing
> ---
> 
> precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>

Re: Nightly build failures caused by attempted use of default ports

2017-08-24 Thread Darrel Schneider

I like the sound of #1:
1) use Docker for AcceptanceTest and IntegrationTest targets?

Does anyone know of a downside to running these tests in docker?

On Wed, Aug 23, 2017 at 12:06 PM, Jared Stewart  wrote:

> I think we just need to have AcceptanceTest (and possibly IntegrationTest)
> run inside Docker like DistributedTest already does.
>
> - Jared.
>
> > On Aug 23, 2017, at 11:32 AM, Anilkumar Gingade 
> wrote:
> >
> >>> 1) use Docker for AcceptanceTest and IntegrationTest targets?
> > To be clear, the failing tests are only in Acceptance Test and
> Integration
> > Tests? And distributed tests are not seeing this issue as they are
> running
> > in docker nowIf moving docker address this issue, my vote is for
> moving
> > docker; this makes all the tests to be run in similar environment.
> >
> >>> 2) not test default ports?
> > If the product supports default port; we need to have test for
> that...Most
> > of the early product evaluation is done with default port...
> >
> > -Anil.
> >
> >
> > On Wed, Aug 23, 2017 at 11:15 AM, Kirk Lund  wrote:
> >
> >> The following nightly build failures are tests that are testing default
> >> ports which are failing because the port is not available.
> >>
> >> Should we:
> >>
> >> 1) use Docker for AcceptanceTest and IntegrationTest targets?
> >>
> >> 2) not test default ports?
> >>
> >> 3) use a hacky System property to force Geode to think that some other
> port
> >> is the default port?
> >>
> >> 4) some other solution?
> >>
> >> testGet – org.apache.geode.rest.internal.web.RestServersJUnitTest
> >> a few seconds
> >> testServerStartedOnDefaultPort –
> >> org.apache.geode.rest.internal.web.RestServersJUnitTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedWhenConnected_server_dir –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedWhenConnected_server_pid –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedEvenWhenNotConnected_server_dir –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedEvenWhenNotConnected_server_pid –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> onlineStatusCommandShouldSucceedWhenConnected_server_name –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedWhenConnected_locator_dir –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> offlineStatusCommandShouldSucceedWhenConnected_locator_pid –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> onlineStatusCommandShouldSucceedWhenConnected_locator_name –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> onlineStatusCommandShouldSucceedWhenConnected_locator_port –
> >> org.apache.geode.management.internal.cli.shell.
> >> GfshExitCodeStatusCommandsTest
> >> a few seconds
> >> statusLocatorSucceedsWhenConnected –
> >> org.apache.geode.management.internal.cli.commands.
> >> StatusLocatorRealGfshTest
> >>
>
>

Re: Review Request 61852: GEODE-3507 actualRedundantCopies stat in org.apache.geode.internal.cache.PartitionedRegionStats can go negative

2017-08-23 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61852/#review183626
---


Ship it!




Ship It!

- Darrel Schneider


On Aug. 23, 2017, 9:55 a.m., Lynn Gallinat wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61852/
> ---
> 
> (Updated Aug. 23, 2017, 9:55 a.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Eric Shu, and 
> Nick Reich.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The PartitionedRegionRedundancyTracker class can still hold a -1 to do it's 
> work, but will now not allow -1 to be written to stats
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionRedundancyTracker.java
>  38ef61b 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/PartitionedRegionRedundancyTrackerTest.java
>  0917835 
> 
> 
> Diff: https://reviews.apache.org/r/61852/diff/1/
> 
> 
> Testing
> ---
> 
> Precheckin (all green)
> I repeatedly ran one of the tests that reproduced this problem and it passed 
> 370 times.
> I ran selected partitioned region bts (totaling 536 PR tests) and the only 
> failures were auto-diagnosed.
> 
> 
> Thanks,
> 
> Lynn Gallinat
> 
>

Re: Review Request 61722: GEODE-3047 Atomic creation flag is not set if the region is recoverd from the disk.

2017-08-22 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61722/#review183507
---


Ship it!




Ship It!

- Darrel Schneider


On Aug. 18, 2017, 9:58 a.m., anilkumar gingade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61722/
> ---
> 
> (Updated Aug. 18, 2017, 9:58 a.m.)
> 
> 
> Review request for geode, Darrel Schneider, Eric Shu, Lynn Gallinat, and Nick 
> Reich.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> While creating bucket region, to satisfy the redudndancy copies the bucket 
> regions
> are created on all available nodes. The initialization (setting 
> persistentIDs) of
> these buckets are done after creating buckets on all the nodes. This 
> introduced
> a race condition for the bucket region which are recovered from the disk to
> exchange thier old id with the peer node resulting in 
> ConflictingPersistentData
> Exception.
> 
> The changes are done so that the regions persistent ids are set as soon as 
> they
> are created/initialized.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/BucketPersistenceAdvisor.java
>  367bb75e9 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/BucketPersistenceAdvisorTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61722/diff/1/
> 
> 
> Testing
> ---
> 
> added new unit test.
> run PartitionedRegionSingleHopDUnitTest.testClientMetadataForPersistentPrs 
> 1400 times
> 
> 
> Thanks,
> 
> anilkumar gingade
> 
>

Re: Need to know complete execution time for geode-core Junit tests

2017-08-07 Thread Darrel Schneider

For test004StartServerFailsFastOnMissingGemFirePropertiesFile I think this
test just needs to be improved for Windows.
The test has an assertion that the failure message contains a certain
string. On Windows the message contains "C:" and "\" which the test does
not expect. Instead of the test treating "cacheXmlPathName" as a constant I
think it should create an instance of File using cacheXmlPathName and then
make sure the message contains the result of calling getAbsolutePath on
that File.


On Sun, Aug 6, 2017 at 11:54 PM, Dinesh Akhand  wrote:

> Hi Team,
>
> When we are running test on windows : we are getting below error
> We ran : gradlew precheckin
>
>
> rg.apache.geode.management.internal.cli.commands.
> LauncherLifecycleCommandsDUnitTest > 
> test003StartServerFailsFastOnMissingCacheXmlFile
> FAILED
> java.lang.AssertionError:
> Warning: The Geode cache XML file C:\path\to\missing\cache.xml could
> not be found.
>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.apache.geode.management.internal.cli.commands.
> LauncherLifecycleCommandsDUnitTest.test003StartServerFailsFastOnM
> issingCacheXmlFile(LauncherLifecycleCommandsDUnitTest.java:502)
>
> org.apache.geode.management.internal.cli.commands.
> LauncherLifecycleCommandsDUnitTest > 
> test004StartServerFailsFastOnMissingGemFirePropertiesFile
> FAILED
> java.lang.AssertionError:
> Warning: The Geode properties file C:\path\to\missing\gemfire.properties
> could not be found.
>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
>
> Is there any path we need to set in windows to make it run.
>
> Thanks,
> Dinesh Akhand
>
>
>
>
>
>
>
>
>
>
>
> 
> ---
> Found suspect string in log4j at line 546
>
> -Original Message-
> From: Dan Smith [mailto:dsm...@pivotal.io]
> Sent: Tuesday, July 25, 2017 9:59 PM
> To: dev@geode.apache.org
> Subject: Re: Need to know complete execution time for geode-core Junit
> tests
>
> What target are you running? It's a bit confusing because there are
> actually 3 different sets of tests. These times a really rough because I
> don't have a good run in front of me, but this should give you an idea.
>
> "True" unit tests, runs in 1-2 minutes like Jens Said:
> ./gradlew geode-core:test
>
> Single VM integration tests - runs in maybe 1.5 hours:
> ./gradlew geode-core:integrationTest
>
> Multiple VM integration tests - runs in maybe 5 hours?
> .gradlew geode-core:distributedTest
>
> Run all of the tests
> ./gradlew geode-core:precheckin
>
> -Dan
>
> On Tue, Jul 25, 2017 at 8:34 AM, Anthony Baker  wrote:
>
> > You might want to get a few thread dumps to see if there is a hung test.
> > Also, make sure you have sufficient RAM.
> >
> > Anthony
> >
> > > On Jul 25, 2017, at 1:13 AM, Dinesh Akhand 
> wrote:
> > >
> > > Hi Team,
> > >
> > > I trigger the geode-core Junit tests . it keeping running from last
> > > 3
> > hour.
> > > Can you please let me know how much time it take completion of Junit
> > test in Geode-core.
> > > I am using geode 1.1.1
> > >
> > >
> > > Thanks,
> > > Dinesh Akhand
> > > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> >
> >
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>

Re: Review Request 61281: GEODE-3379 Geode transaction may commit on a secondary bucket after bucket rebalance

2017-08-02 Thread Darrel Schneider


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61281/#review182006
---


Ship it!




Ship It!

- Darrel Schneider


On Aug. 1, 2017, 2:41 p.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61281/
> ---
> 
> (Updated Aug. 1, 2017, 2:41 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, Lynn Gallinat, 
> and Nick Reich.
> 
> 
> Bugs: GEODE-3379
> https://issues.apache.org/jira/browse/GEODE-3379
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Current geode commit method only hold primary bucket lock when the txRegion 
> is a primary bucket. However, in between the tx operation and commit, a 
> rebalance could cause primary bucket to move. In this case, commit should 
> detect the primary bucket has been moved and throw 
> TransactionDataRebalancedException.
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/TXState.java 
> 2c8c28b 
>   geode-core/src/test/java/org/apache/geode/disttx/PRDistTXDUnitTest.java 
> 77bf740 
>   
> geode-core/src/test/java/org/apache/geode/disttx/PRDistTXWithVersionsDUnitTest.java
>  9ab5145 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/execute/PRTransactionDUnitTest.java
>  6578baa 
> 
> 
> Diff: https://reviews.apache.org/r/61281/diff/1/
> 
> 
> Testing
> ---
> 
> Ongoing precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>

1 2 3 4 >

1 - 100 of 313 matches

Mail list logo