Re: Volunteers to be JIRA/confluence/mailing list admins?

2021-07-28 Thread Alberto Bustamante Reyes
Hi,

I could help as confluence/jira admin.

Alberto B.

Obtener Outlook para iOS

De: Dan Smith 
Enviado: Wednesday, July 28, 2021 7:52:23 PM
Para: dev@geode.apache.org 
Asunto: Volunteers to be JIRA/confluence/mailing list admins?

Hi all,


We have a couple of admin/human spam filter jobs that I think could use a few 
more volunteers.


  *   Confluence/JIRA admins - we have a process where we grant permission to 
these resources to anyone who asks for access on the mailing list. This could 
be any committer, or really any contributor we are comfortable giving admin 
access to our confluence and/or JIRA projects.
  *   Mailing list moderators - this probably needs to be PMC members since you 
would moderate the private list.

I'd love to get some folks outside of the US time zones so we don't leave 
people outside the US waiting for a day if they need permission.

Any volunteers?

-Dan


"create region" cmd stuck on wan setup

2021-07-28 Thread Alberto Bustamante Reyes
Hi Geode devs,

I have been analyzing an issue that occurs in the following scenario:

1) I start two Geode clusters (cluster1 & cluster2) with one locator and two 
servers each.
Both clusters host a partitioned region called "testregion", which is 
replicated using a parallel gateway sender and a gateway receiver.
These are the gfsh files I have been using for creating the clusters: 
https://gist.github.com/alb3rtobr/e230623255632937fa68265f31e97f3a

2) I run a client connected to cluster2 performing operations on testregion.

3) cluster1 is stopped and all persistent data is deleted. And then, I create 
cluster1 again.

4) At this point, the command to create "testregion" get stuck.


After checking the thread stack and the code, I found that the problem is the 
following.

This thread is trapped on an infinite loop waiting for a bucket primary 
election at "PartitionedRegion.waitForNoStorageOrPrimary":


"Function Execution Processor4" tid=0x55
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.11/java.lang.Object.wait(Native Method)
-  waiting on org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
at 
app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:195)
at 
app//org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:183)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:1177)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3050)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2910)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2894)
at 
app//org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773)


After creating testregion, the sender queue partitioned region is created. 
While that region buckets are recovered the command is trapped on an infinite 
loop waiting for a primary bucket election at 
PartitionedRegion.waitForNoStorageOrPrimary.

This seems to be a known issue because in 
PartitionedRegion.getNodeForBucketWrite, there is the following command before 
calling waitForNoStorageOrPrimary (and the command has been there since Geode's 
first commit!) :

// Possible race with loss of redundancy at this point.
// This loop can possibly create a soft hang if no primary is ever selected.
// This is preferable to returning null since it will prevent obtaining the
// bucket lock for bucket creation.
return waitForNoStorageOrPrimary(bucketId, "write");

Any idea about why the primary bucket is not elected?

It seems the failure is related with the fact that "testregion" is receiving 
updates from the receiver before the "create region" command has finished. If 
the test is repeated without traffic on cluster2 or if I create the cluster1's 
receiver after creating "testregion", this problem is not happening.

Is there any recommendation on the startup order of regions, senders and 
receivers for an scenario like the one described?

Thanks in advance,

Alberto B.


About improving GemFireIOException messages

2021-06-17 Thread Alberto Bustamante Reyes
Hi Geode devs,

We received a comment of a Geode user saying that he got an error during a 
cache initialization and he found the error message was not too descriptive for 
him.
The message was the following:

""Cache initialization for GemFireCache[id = 1081136680; isClosing = false; 
isShutDownAll = false; created = Fri Jun 19 11:42:29 UTC 2020; server = false; 
copyOnRead = false; lockLease = 120; lockTimeout = 60] failed because: 
org.apache.geode.GemFireIOException: While starting cache server CacheServer on 
port=40404 client subscription config policy=none client subscription config 
capacity=1 client subscription config overflow directory=.""

The message was logged by GemFireCacheImpl as:

logger.error("Cache initialization for " + toString() + " failed because:", 
throwable);

As you can see, the "throwable" object was a GemFireIOException, which was 
thrown by CacheCreation.


GemFireIOException wraps other exceptions (I have checked the code and they are 
mostly IOExceptions, but also MessageTooLargeException, FileNotFoundException, 
SerializationException or just Exception) and it could be the case that useful 
information about the cause of the exception is hidden, depending on how 
meaningful is the message used in the constructor:



public GemFireIOException(String message, Throwable cause)  {
super(message, cause);
}


With this constructor, the "cause" message is not included in the 
GemFireIOException message. I was thinking about changing the constructor to 
something like:


public GemFireIOException(String message, Throwable cause) {
  super(message + ( (cause != null) ? " ( " + cause.getMessage() + " )" : ""), 
cause);
}

Whats your opinion about this?

Thanks in advance,

Alberto B.







RE: Reminder to use draft mode

2021-05-19 Thread Alberto Bustamante Reyes
Most probably you are already aware of it (and this is why it has not been 
changed in geode repo) but it seems github does not allow to configure draft 
PRs as default option for a given repository.




De: Mark Hanson 
Enviado: viernes, 7 de mayo de 2021 23:06
Para: dev@geode.apache.org ; Blake Bender 

Asunto: Re: Reminder to use draft mode

Correct me if I am wrong, but I think the basic consensus here is that the 
starting state for all PRs should be draft. We should change the default if we 
can... The sticky part is when to put a PR from standard to draft. If we always 
start with draft mode, lets deal with that when it becomes a real problem. I 
don't think it will be a significant one if the consensus is we start with 
draft PRs.

Thanks,
Mark

On 5/7/21, 2:03 PM, "Mark Hanson"  wrote:

@Blake Bender The same goes for the geode code. The PR pipeline is *the* 
way to know if we broke something or not. Most people don't know how to run the 
individual tests.

Thanks,
Mark

On 5/7/21, 10:07 AM, "Blake Bender"  wrote:

+1 for draft mode as default.  I'm forever switching to it in 
geode-native already, because the most convenient way for us to get feedback on 
build/test status for all platforms is to run a change through CI, and the only 
way to do that is to submit it as a PR.

Thanks,

Blake


-Original Message-
    From: Alberto Bustamante Reyes 
Sent: Thursday, May 6, 2021 1:19 PM
To: dev@geode.apache.org
Subject: RE: Reminder to use draft mode

+1 to Mark's proposal of setting draft mode as default when creating 
PRs (Im wondering if a new VOTE thread is needed to approve it)

And also +1 to Donal's comments.


De: Darrel Schneider 
Enviado: jueves, 6 de mayo de 2021 21:43
Para: dev@geode.apache.org 
Asunto: Re: Reminder to use draft mode

+1 to Donal's comments

From: Donal Evans 
Sent: Thursday, May 6, 2021 11:44 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

+1 to Naba's PR flow described above.

Creating PRs in draft mode is almost always the best choice, as it 
prevents people from being tagged to review a set of changes that may change 
significantly due to test failures and only requires a single click to convert 
to the "ready to review" state - hardly a major inconvenience.

However, the real tricky question here seems to be "When should you 
move a PR from "Ready to review" back into draft mode?" I tend to agree with 
Jens that a flaky test failure by itself isn't enough to warrant putting a PR 
back into draft mode, as it's often possible to identify the failure as being 
due to an existing known bug and merge the PR knowing that your changes aren't 
the cause. We don't require that all PR tests are green before merging, just 
some of them, so it's reasonable to assume that we don't require all PR tests 
to be green before a PR is considered ready for review either.

Minor edits due to review comments (like spelling mistakes or minor 
code quality/style changes) also don't feel like they should cause a PR to be 
put back into draft mode, as while the contents of the PR may change because of 
them, it won't invalidate other in-progress reviews if it does, or 
significantly alter the nature of the PR.

For me, the bar for whether a PR should be put back into draft mode is 
if you know that its current state is not reflective of the final state that 
will be merged into develop. In general, the only time that should happen is if 
you've received review feedback that will require a change of approach or 
significant refactoring/additional code. It's the difference between "needs a 
little polish" and "needs more work," I think. Obviously, what counts as 
"significant" is entirely subjective, so this isn't much use as a hard and fast 
rule, but a rough guide might be that if a reviewer has requested changes that 
would invalidate or render obsolete/redundant any additional reviews that come 
in before those changes are applied, moving back to draft mode would probably 
be a good idea.

Donal

From: Nabarun Nag 
Sent: Thursday, May 6, 2021 10:22 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

I feel that Owen has a valid point and I myself feel that it is ok to 
start the PR in draft mode till the pre-check tests pass.

There has been this situation where,

  *   PR is created (reviewers are assigned)
  *   approved
  *   Tests fail
  *   code is changed
  *   no reviews
  *   code is merged

Henc

RE: Reminder to use draft mode

2021-05-06 Thread Alberto Bustamante Reyes
+1 to Mark's proposal of setting draft mode as default when creating PRs (Im 
wondering if a new VOTE thread is needed to approve it)

And also +1 to Donal's comments.


De: Darrel Schneider 
Enviado: jueves, 6 de mayo de 2021 21:43
Para: dev@geode.apache.org 
Asunto: Re: Reminder to use draft mode

+1 to Donal's comments

From: Donal Evans 
Sent: Thursday, May 6, 2021 11:44 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

+1 to Naba's PR flow described above.

Creating PRs in draft mode is almost always the best choice, as it prevents 
people from being tagged to review a set of changes that may change 
significantly due to test failures and only requires a single click to convert 
to the "ready to review" state - hardly a major inconvenience.

However, the real tricky question here seems to be "When should you move a PR 
from "Ready to review" back into draft mode?" I tend to agree with Jens that a 
flaky test failure by itself isn't enough to warrant putting a PR back into 
draft mode, as it's often possible to identify the failure as being due to an 
existing known bug and merge the PR knowing that your changes aren't the cause. 
We don't require that all PR tests are green before merging, just some of them, 
so it's reasonable to assume that we don't require all PR tests to be green 
before a PR is considered ready for review either.

Minor edits due to review comments (like spelling mistakes or minor code 
quality/style changes) also don't feel like they should cause a PR to be put 
back into draft mode, as while the contents of the PR may change because of 
them, it won't invalidate other in-progress reviews if it does, or 
significantly alter the nature of the PR.

For me, the bar for whether a PR should be put back into draft mode is if you 
know that its current state is not reflective of the final state that will be 
merged into develop. In general, the only time that should happen is if you've 
received review feedback that will require a change of approach or significant 
refactoring/additional code. It's the difference between "needs a little 
polish" and "needs more work," I think. Obviously, what counts as "significant" 
is entirely subjective, so this isn't much use as a hard and fast rule, but a 
rough guide might be that if a reviewer has requested changes that would 
invalidate or render obsolete/redundant any additional reviews that come in 
before those changes are applied, moving back to draft mode would probably be a 
good idea.

Donal

From: Nabarun Nag 
Sent: Thursday, May 6, 2021 10:22 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

I feel that Owen has a valid point and I myself feel that it is ok to start the 
PR in draft mode till the pre-check tests pass.

There has been this situation where,

  *   PR is created (reviewers are assigned)
  *   approved
  *   Tests fail
  *   code is changed
  *   no reviews
  *   code is merged

Hence code that is not reviewed has been merged

This way of doing work also has the following advantages:

  *   A reviewer does not have to review a code that causes tests to fail
  *   A reviewer does not have to review code twice before failure and then 
again after changing the code to fix the failure
  *   Unreviewed code post-test fixes do not get merged

I think this way of working saves a critical amount of time for engineers who 
review code.

This flow of PRs feels more efficient:


  *   Create PR in draft mode - no reviewers assigned
  *   PRechecks fail
  *   change/fix code
  *   tests pass - all green
  *   convert PR to ready for review - reviewers assigned
  *   reviewers review

Regards
Naba




From: Owen Nichols 
Sent: Thursday, May 6, 2021 9:59 AM
To: dev@geode.apache.org 
Subject: Re: Reminder to use draft mode

Given the lack of consensus, it sounds like it will not be possible to make any 
assumptions about a PR based on whether it is in Draft mode or not.  I will 
stop retriggering flaky checks or changing PRs to draft status.  My apologies 
for the inconvenience this has caused.

On 5/6/21, 9:47 AM, "Jens Deppe"  wrote:

I don’t think we can presume everyone has the same working style. For 
myself I’ll happily review a PR that has a failing check. I’m OK if it has some 
innocuous ‘housekeeping’ error or unrelated failure.

I don’t retrigger PR failures, for unrelated errors, just to ‘get to green’ 
– related, I don’t expect anyone to do that on my part either. It would be 
frustrating if I was about to merge something and someone retriggers a job. Yes 
I do merge if I’m 100% confident the failed check is unrelated. I don’t merge 
if any checks are still pending.

Perhaps this is just relevant to my current situation, but most of my PRs 
are module specific and so there is collaboration between my team and we 
typically know the state of our 

RE: [RFC PROPOSAL] Geode Command to replicate region data from one site to another connected via WAN

2021-04-26 Thread Alberto Bustamante Reyes
Apart from comments about the functionality, feedback about the RFC quality is 
welcome. Is it detailed enough? Are you missing something? It will help us to 
improve for next RFCs.

Thanks in advance,

Alberto B.

De: Alberto Gomez 
Enviado: jueves, 22 de abril de 2021 16:51
Para: dev@geode.apache.org 
Asunto: [RFC PROPOSAL] Geode Command to replicate region data from one site to 
another connected via WAN

Hi all,

In the following link you can find a proposal to introduce a Geode command to 
replicate region data between sites connected via WAN.

https://cwiki.apache.org/confluence/display/GEODE/Geode+Command+to+replicate+region+data+from+one+site+to+another+connected+via+WAN

As per RFC guidelines, please comment in this mail thread.

Thanks,

Alberto G.


RE: CODEOWNERS vs CODEWATCHERS

2021-03-18 Thread Alberto Bustamante Reyes
Hi,

Is it possible to add contributors to the CODEWATCHERS file? Or is it strictly 
necessary to be a committer?

BR/

Alberto B.

De: Owen Nichols 
Enviado: viernes, 12 de marzo de 2021 23:38
Para: dev@geode.apache.org 
Asunto: CODEOWNERS vs CODEWATCHERS

The Geode community has certainly become closer now that CODEOWNERS 
automatically adds a lot of people to the reviewer list for new PRs, but you 
may still feel like you’re missing out on PRs outside your area of expertise.

If you are a committer and there are additional code areas you’d like to be 
automatically added as an optional reviewer (as opposed to a binding 
codeowner), you can now do that through CODEWATCHERS.  Same file format, same 
process (submit a PR to make any changes).


RE: Geode Native Tooling

2021-02-25 Thread Alberto Bustamante Reyes
Very good news for the native client. Thanks for the improvement Jacob!

BR/

Alberto B.

De: Jacob Barrett 
Enviado: jueves, 25 de febrero de 2021 20:49
Para: dev@geode.apache.org 
Asunto: Re: Geode Native Tooling

Running the legacy tests on windows is still flaky. Windows!

> On Feb 25, 2021, at 9:46 AM, Mario Salazar de Torres 
>  wrote:
>
> Hi Jacob,
>
> No words to describe what a huge effort you put into this. I think it's 
> really going to ease verification work for everyone in the future.
> However, today I was running one PR on Concourse and I noticed that Windows 
> 2019 job was failing. Could it be that there are some details to polish for 
> that pipeline?
>
> BR,
> Mario.
> 
> From: Jacob Barrett 
> Sent: Thursday, February 25, 2021 6:04 PM
> To: dev@geode.apache.org 
> Subject: Geode Native Tooling
>
> Hey Geode Native devs,
>
> You may have noticed this new CI that looks a lot like the Java CI. PRs will 
> now be executed against this CI and several platforms. All unit and 
> integration tests are executed on all the platforms as well. It also takes 
> over the tasks of the old Travis CI of validating your sources for 
> formatting, license and static analysis for correctness.
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fgeode-native-developdata=04%7C01%7Cjabarrett%40vmware.com%7Cac0f398338d7416dfc5808d8d9b55faa%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637498720303119021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=kMHK1Yxe1axWH6wCUHG6y%2F2%2By8t%2Fl6dDFMzBoWkL1Sk%3Dreserved=0
>
> The clang-format and clang-tidy tools used by the CI have been upgraded to 
> clang 11. When checking locally please make sure your clang-tidy and 
> clang-format detected by CMake is version 11. There are some slight 
> formatting differences between clang-format 11 and the previous version 6. 
> The new clang-tidy also detects things that version 6 did not detect.
>
>
> Thanks,
> Jake
>



RE: Different binding addresses for traffic & membership

2021-01-22 Thread Alberto Bustamante Reyes
Thanks for your answer Dan! We have checked that setting "0.0.0.0" in 
DirectChannel and in GMSHealthMonitor seems to solve our problem. I created PR 
to check if using "0.0.0.0" could have an impact and all public test cases 
seems to work fine in that case (https://github.com/apache/geode/pull/5946).

Next step will be the implementation of an option to allow a user to set 
"0.0.0.0" address in these classes.

BR/

Alberto Bustamante

De: Dan Smith 
Enviado: jueves, 21 de enero de 2021 0:48
Para: dev@geode.apache.org 
Asunto: Re: Different binding addresses for traffic & membership

I've been looking into the code a little bit to see if this is possible. I'm 
not sure it is right now.

Here's some pointers at where to look at. Most of the magic is happening in 
JGroupsMessenger. JGroupsMessenger wraps jgroups, which we are using to UDP 
messaging related to membership.

The first thing that happens is that JGroupsMessenger.init creates a jgroups 
configuration. It does some string replacement on the jgroups-config.xml file 
that is checked. It puts the configured bind-address into that configuration.

When JGroupsMessenger.start() is called jgroups will bind to that address. 
Right after that, JGroupsMessenger calls establishLocalAddress, which takes the 
IP address that jgroups just bound to and creates our local MemberIdentifier.

Later in GMSJoinLeave.attemptToJoin, it sends that local address to the 
coordinator. Assuming the join is successful, the coordinator will send out a 
view that includes that MemberIdentifier.


I was really hoping that just setting a bind address of "0.0.0.0" would do the 
right thing in this case. But it looks like jgroups won't let you bind to that 
address. I don't currently see a way to get a different address in the 
MemberIdentifier than the one that jgroups is listening on right now.

Besides the UDP port that jgroups is listening on, there are a couple of other 
TCP ports used for peer-to-peer messaging. GMSHealthMonitor also starts 
listening on the same local address returned from jgroups. And the 
DirectChannel class I think also eventually ends up creating a server socket 
that listens on the same bind-address. That one might be ok with "0.0.0.0".

-Dan

PS - there is a lot more information on membership on the wiki if it is 
helpful, but I don't think it gets into this level of detail about what address 
gets used - 
https://cwiki.apache.org/confluence/display/GEODE/Membership+Manager+and+Messaging.



From: Aaron Lindsey 
Sent: Wednesday, January 20, 2021 2:51 PM
To: dev@geode.apache.org 
Subject: Re: Different binding addresses for traffic & membership

> Is there any way to configure a bind address to be used only for membership?

To your first question, I asked around but I’m not aware of anything like what 
you are looking for. What you are describing does seem like it could become a 
common setup on Kubernetes, but I personally haven’t tried using Geode with 
Istio and Envoy. Please share what you learn!

> I thought that it will be interesting to take a look at how the membership 
> works (how the distributed system is created), to check if at some point I 
> could decouple how the value of "bind-address" parameter is used to configure 
> binding and to indicate other members that they can reach the new member at 
> that hostname. Any comment about what I should check first is welcome.

Maybe someone with more experience in the membership code could comment on this?

Aaron

> On Jan 20, 2021, at 9:07 AM, Alberto Bustamante Reyes 
>  wrote:
>
> It seems this is not a trendic topic...  Let me share my approach by the 
> moment, maybe this will receive more comments:
>
> I thought that it will be interesting to take a look at how the membership 
> works (how the distributed system is created), to check if at some point I 
> could decouple how the value of "bind-address" parameter is used to configure 
> binding and to indicate other members that they can reach the new member at 
> that hostname. Any comment about what I should check first is welcome.
>
> Thanks!
>
> BR/
>
> Alberto Bustamante
>
>
>
>
>
> 
> De: Alberto Bustamante Reyes 
> Enviado: martes, 19 de enero de 2021 1:45
> Para: dev@geode.apache.org 
> Asunto: Different binding addresses for traffic & membership
>
> Hi geode-devs,
>
> I have a question related with Geode & Kubernetes:
> We would like to use Istio with Geode. For that, a sidecar container (Envoy) 
> has to be added in each Geode pod. That sidecar container intercepts and 
> handles all incoming and outgoing traffic for that pod. One of the 
> requirements set by Istio towards applications trying to integrate with it i

RE: Different binding addresses for traffic & membership

2021-01-20 Thread Alberto Bustamante Reyes
It seems this is not a trendic topic...  Let me share my approach by the 
moment, maybe this will receive more comments:

I thought that it will be interesting to take a look at how the membership 
works (how the distributed system is created), to check if at some point I 
could decouple how the value of "bind-address" parameter is used to configure 
binding and to indicate other members that they can reach the new member at 
that hostname. Any comment about what I should check first is welcome.

Thanks!

BR/

Alberto Bustamante





____
De: Alberto Bustamante Reyes 
Enviado: martes, 19 de enero de 2021 1:45
Para: dev@geode.apache.org 
Asunto: Different binding addresses for traffic & membership

Hi geode-devs,

I have a question related with Geode & Kubernetes:
We would like to use Istio with Geode. For that, a sidecar container (Envoy) 
has to be added in each Geode pod. That sidecar container intercepts and 
handles all incoming and outgoing traffic for that pod. One of the requirements 
set by Istio towards applications trying to integrate with it is that the 
application listening ports need to be bound to either localhost or 0.0.0.0 
address (which listens on all interfaces).

Geode binds the locator and server traffic port by default to 0.0.0.0, but the 
membership ports are bound to the pod IP.
And with Envoy listening on the pod IP for incoming traffic and proxying 
everything towards localhost, applications binding to pod IPs won't receive any 
traffic.

We have tried using the "bind-address" parameter, but that doesn't work for our 
case. Geode binds the listening ports to the configured address, but it also 
shares that same address to other members in the system as the address to be 
used to reach it. If we configure that address to localhost, it just won't work.

Is there any way to configure a bind address to be used only for membership? I 
have not seen any configuration parameter or property that could be useful to 
solve this problem, maybe I missed it.

Thanks in advance,

BR/

Alberto Bustamante


Different binding addresses for traffic & membership

2021-01-18 Thread Alberto Bustamante Reyes
Hi geode-devs,

I have a question related with Geode & Kubernetes:
We would like to use Istio with Geode. For that, a sidecar container (Envoy) 
has to be added in each Geode pod. That sidecar container intercepts and 
handles all incoming and outgoing traffic for that pod. One of the requirements 
set by Istio towards applications trying to integrate with it is that the 
application listening ports need to be bound to either localhost or 0.0.0.0 
address (which listens on all interfaces).

Geode binds the locator and server traffic port by default to 0.0.0.0, but the 
membership ports are bound to the pod IP.
And with Envoy listening on the pod IP for incoming traffic and proxying 
everything towards localhost, applications binding to pod IPs won't receive any 
traffic.

We have tried using the "bind-address" parameter, but that doesn't work for our 
case. Geode binds the listening ports to the configured address, but it also 
shares that same address to other members in the system as the address to be 
used to reach it. If we configure that address to localhost, it just won't work.

Is there any way to configure a bind address to be used only for membership? I 
have not seen any configuration parameter or property that could be useful to 
solve this problem, maybe I missed it.

Thanks in advance,

BR/

Alberto Bustamante


Reviewers needed for GEODE-8202

2020-10-15 Thread Alberto Bustamante Reyes
Hi all,

Could someone take a look at PR of GEODE-8202? 
https://github.com/apache/geode/pull/5600

It introduces a new option for serial gw sender threads startup.

More info in the RFC: 
https://cwiki.apache.org/confluence/display/GEODE/New+option+for+serial+gw+sender+dispatcher+threads+start

Thanks!

Alberto B.


RE: Clean C++ client metadata in timeouts

2020-09-21 Thread Alberto Bustamante Reyes
Hi,

Just for clarification. When there is an "IO error in handshake" what is 
deleted is just the information of the failing server, not the whole metadata 
of the client.

BR/

Alberto B.

De: Jacob Barrett 
Enviado: viernes, 18 de septiembre de 2020 21:32
Para: dev@geode.apache.org 
Asunto: Re: Clean C++ client metadata in timeouts

+1 to what Anthony is asking.

Rather the “fixing” the current behavior let’s just implement a behavior that 
better achieves the goal of single hop optimization.

From what I recall for both the Java and C++ code is that we throw away all 
metadata on a region whenever there is any triggering event. We should keep the 
old metadata until we have new metadata. Even if the data is partially correct 
its better than random server selection.

I don’t recall all the triggering events, but I think anything that cased a 
server to be removed form the pool triggered this. The silly thing is that 
pretty much any exception on the connection cased not only that connection to 
be closed but all other connections to the same server to be closed. This 
pre-mature termination of connections is probably not ideal. And again, 
throwing out all the metadata for what probably only effects a subset of the 
metadata is bad.

Metadata is completely asynchronous and only triggers after “failure” events, 
like those above or a good response with he metadata update flag set. Is there 
a way to get this metadata to the client more quickly? I suspect not easily, 
maybe sending something in ping messages, but this assume mostly idle clients.

I suspect what we would find is that just avoiding the complete dismissal of 
metadata should suffice. We could start with that and then optimize from there.

As for the original post about the client trying to connect to a stopped 
server. Is this scenario where the client is going to perform a put, or 
otherwise mutation operation, so the primary server is necessary but the pool 
doesn’t have any available connections so it tries to create one. On read only 
ops it should be going to the locator I think for a “balanced” connection to a 
server hosting the bucket (though my recollection of the code is old). I think 
it would be perfectly fine in this scenario to assume the metadata could be 
incorrect and to fetch it. I would not throw out the current metadata. We could 
even fetch this metadata synchronously so the current operation doesn’t waste 
time continually trying to connect to the wrong server, though it is possible 
this error is transient.

So, yeah, I think it just makes sense for us to look at this behavior from the 
single hop optimization perspective and make that feature behave like it should 
and not worry about options to enable different behavior. The end goal is to 
have optimal single hop operations, if the current implementation doesn’t do 
that then we fix that. No configuration options necessary.

-Jake


On Sep 18, 2020, at 8:54 AM, Anthony Baker 
mailto:bak...@vmware.com>> wrote:

I’m not sure I have answers so I’ll just ask more questions :-)

When a server is killed, does that provoke an asynchronous metadata update to 
clients?  I could be wrong about that but if it IS true, then perhaps we should 
focus on optimizing that path. The sooner that a client can get accurate bucket 
location data the faster it can service requests.

I suggest this because I know that wiping out *all* bucket metadata on the 
client means that we’ve now destroyed the ability of the client to do 
single-hop operations until the metadata is refreshed. This has the cost of 
additional latency on each client request and the hidden cost of additional 
sockets and threads within the cluster to service the extra hop by forwarding 
requests to the appropriate server.  This is an important because many users 
test and size their geode cluster based on single-hop resource consumption and 
it’s a very steep step up when this is not possible.  If there’s insufficient 
headroom to handle the additional load it can tip a bad situation (single node 
failing) into a much worse cascading condition (multiple nodes failing).

So I guess my questions are:
- What triggers a metadata refresh and how can we make that faster?
- Can we very selectively identify that some metadata is out of date and 
invalidate that information only?


Anthony


On Sep 18, 2020, at 3:50 AM, Alberto Bustamante Reyes 
mailto:alberto.bustamante.re...@est.tech><mailto:alberto.bustamante.re...@est.tech>>
 wrote:

Hi,

Thanks for you messages, here you are some answers:

Dave:
Are there cases in which one or two timeouts are followed by a successful
retry? Or does one timeout *always* end with more timeouts and, ultimately,
an IO error?
Not in our use case, which is kiling a server. In this case, timeouts will end 
up on an IO error.

If a straight-up change solves a constant headache, as you suggest, Alberto, 
and as Blake concurs, that so

RE: Clean C++ client metadata in timeouts

2020-09-18 Thread Alberto Bustamante Reyes
Hi,

Thanks for you messages, here you are some answers:

Dave:
Are there cases in which one or two timeouts are followed by a successful
retry? Or does one timeout *always* end with more timeouts and, ultimately,
an IO error?
Not in our use case, which is kiling a server. In this case, timeouts will end 
up on an IO error.

If a straight-up change solves a constant headache, as you suggest, Alberto, 
and as Blake concurs, that sounds like the way to go. Why introduce a new 
option or property if the user will always prefer one behavior over the other?

The fix works fine for our use case, I suggested the alternatives to make it 
something optional in case there were concerns about it. In other projects I 
have been involved in the past, we had to deal with temporary network problems. 
So most of the times, if a timeout had a consequence (so to say), that was not 
applied after just one timeout.

But its true that in this use case, a timeout always ends up on an IO error, as 
I said. So if you dont see any problem with cleaning the metadata just after 
one timeout, then we dont need any control mechanism for it.



Blake:
Given that attempts to retrieve metadata after the C++ cache is closed are a 
constant headache for Geode Native development, I am generally in favor of 
anything that potentially reduces the number of times/places this happens.  If 
we've failed the handshake, it's very unlikely things will correct themselves 
without outside intervention, so this fix is probably goodness. I'd go ahead 
and submit a PR when you think it's solid.

Good to hear that. The code changes in the draft PR are ready, I just need to 
figure out the testing part. Im not sure how I will add a test because it would 
be the same test as the one added for GEODE-8231...


BR/

Alberto B.



De: Ernie Burghardt 
Enviado: jueves, 17 de septiembre de 2020 22:08
Para: dev@geode.apache.org 
Asunto: Re: Clean C++ client metadata in timeouts

Let's please consider how this would controlled and look for ways other than 
YetAnotherProperty

Thanks,
EB

On 9/17/20, 12:59 PM, "Dave Barnes"  wrote:

If a straight-up change solves a constant headache, as you suggest,
Alberto, and as Blake concurs, that sounds like the way to go.
Why introduce a new option or property if the user will always prefer one
behavior over the other? (And from a docs perspective, who needs another
optional property, anyway?)

On Thu, Sep 17, 2020 at 10:32 AM Blake Bender  wrote:

> Given that attempts to retrieve metadata after the C++ cache is closed are
> a constant headache for Geode Native development, I am generally in favor
> of anything that potentially reduces the number of times/places this
> happens.  If we've failed the handshake, it's very unlikely things will
> correct themselves without outside intervention, so this fix is probably
> goodness.  I'd go ahead and submit a PR when you think it's solid.
>
> Thanks,
>
> Blake
>
>
> On 9/17/20, 9:36 AM, "Dave Barnes"  wrote:
>
> Alberto,
> Are there cases in which one or two timeouts are followed by a
> successful
> retry? Or does one timeout *always* end with more timeouts and,
> ultimately,
> an IO error?
> If timeouts can sometimes be followed by successful retries, and
> re-trying
> is the current default behavior, then I agree that introducing a
> setting
> that effectively eliminates re-tries should be the developer's choice.
> In that case, I suggest that the option should not be a low-level
> choice of
> "handle the metadata in a way that eliminates retries" but should be
> higher
> level, like "when attempting to connect, try only once, instead of
    > re-trying (the default behavior)."
> -Dave
>
> On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi geode-dev,
> >
> > I have a question about the c++ client.
> >
> > Some months ago we merged GEODE-8231 to solve a problem we observed
> > regarding the native client was trying to connect to stopped server.
> > GEODE-8231 solution consists on remove the client metadata when an
> "IO
> > error in handshake" exception is received. This fix solved most of
> our
> > problems, but it has been observed that sometimes when a server is
> stopped
> > the errors received in the client are not the same and this "IO
> error in
> > handshake" takes up to a minute to appear. So during that time, the
> client
&g

Clean C++ client metadata in timeouts

2020-09-17 Thread Alberto Bustamante Reyes
Hi geode-dev,

I have a question about the c++ client.

Some months ago we merged GEODE-8231 to solve a problem we observed regarding 
the native client was trying to connect to stopped server.
GEODE-8231 solution consists on remove the client metadata when an "IO error in 
handshake" exception is received. This fix solved most of our problems, but it 
has been observed that sometimes when a server is stopped the errors received 
in the client are not the same and this "IO error in handshake" takes up to a 
minute to appear. So during that time, the client is still trying to connect to 
the offline server.

As the error received during that time is "timeout in handshake", we have 
tested modyfing the solution of GEODE-8213 to make the client to remove the 
metadata once a timeout error is received (here is a draft with the code: 
https://github.com/apache/geode-native/pull/651). With this change in place, 
the behavior is ok.


But I would like to check your opinion about this check, because this will 
cause that a single timeout will cause the removal of the client metadata, 
which maybe its not the best solution. I thought about different alternatives:

- Wait until a given number of timeouts in a row have been received from the 
same server to remove the metadata
- Make this "remove-metadata-after-timeout" something optional that could be 
configured if needed

As this will misalign the behavior of Java and C++ clients, making this an 
optional configuration will be more appropriate, to keep the default c++ client 
behavior as the Java client.

BR/

Alberto B.


RE: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create Issue" dialogue and include "Affects Version/s"

2020-08-18 Thread Alberto Bustamante Reyes
+1

De: Mario Kevo 
Enviado: martes, 18 de agosto de 2020 7:39
Para: dev@geode.apache.org 
Asunto: Odg: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create 
Issue" dialogue and include "Affects Version/s"

+1

Šalje: Dave Barnes 
Poslano: 18. kolovoza 2020. 7:23
Prima: dev@geode.apache.org 
Predmet: Re: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create 
Issue" dialogue and include "Affects Version/s"

+1 esp addition of "Affects Version/s".

On Mon, Aug 17, 2020 at 3:07 PM Kirk Lund  wrote:

> +1 if it's possible
>
> On Mon, Aug 17, 2020 at 12:04 PM Donal Evans  wrote:
>
> > Looking at the dialogue that opens when you attempt to create a new
> ticket
> > in the GEODE Jira[1], there are two fields included that aren't really
> > necessary and may cause confusion. The "Fix Version/s" field should
> > presumably not be filled out until the issue has actually been fixed,
> > rather than at the time of ticket creation. The "Sprint" field seems to
> no
> > longer serve any purpose at all that I can discern, having only been
> filled
> > in 13 tickets, the most recent of which was created in December 2018[2].
> > With the expansion of the community contributing to the Geode project,
> it's
> > important to provide a straightforward experience for people who are new
> to
> > the project and wish to file tickets, so the presence of these fields may
> > cause issues.
> >
> > I propose that these two fields be removed from the "Create Issue"
> > dialogue and that the "Affects Version/s" field be added, since that
> field
> > is far more important at time of ticket creation. There are currently
> 3851
> > bug tickets in the Jira with no "Affects Version/s" value entered at
> > all[3], which I suspect is in part due to that field not being an option
> in
> > the "Create Issue" dialogue, meaning you have to remember to go back
> after
> > creating the ticket and enter it. With Geode moving to a model of having
> > support branches and patch releases, properly capturing the versions
> > affected by a given issue becomes even more important.
> >
> > [1] https://i.imgur.com/oQ8CW87.png
> > [2]
> >
> https://issues.apache.org/jira/projects/GEODE/issues/GEODE-8433?filter=allissues=cf%5B12310921%5D+ASC%2C+created+DESC
> > [3]
> >
> https://issues.apache.org/jira/browse/GEODE-8433?jql=project%20%3D%20GEODE%20AND%20issuetype%20%3D%20Bug%20AND%20affectedVersion%20%3D%20EMPTY%20ORDER%20BY%20created%20DESC%2C%20affectedVersion%20ASC%2C%20cf%5B12310921%5D%20ASC
> >
>


Review needed for c++ client ticket

2020-07-23 Thread Alberto Bustamante Reyes
Hi,

Could someone please take a look at this c++ client PR? 
https://github.com/apache/geode-native/pull/628

It solves a problem reported in the users list: 
https://markmail.org/thread/gajd4ok65w227fhl

Thanks,

Alberto B.



RE: [VOTE] change Default branch for geode-examples to 'develop'

2020-07-10 Thread Alberto Bustamante Reyes
+1

De: Joris Melchior 
Enviado: viernes, 10 de julio de 2020 15:54
Para: dev@geode.apache.org 
Asunto: Re: [VOTE] change Default branch for geode-examples to 'develop'

+1

On 2020-07-10, 12:39 AM, "Owen Nichols"  wrote:

A fresh checkout of geode and all but one geode- repos 
checks out develop as the Default branch.

The lone exception is geode-examples.  Please vote +1 if you are in favor 
of changing its Default branch to develop for consistency with the other repos 
and other reasons as per recent discussion[1].

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Frfec15c0a7d5d6d57beed90868dbb53e3bfcaabca67589b28585556ee%40%253Cdev.geode.apache.org%253Edata=02%7C01%7Cjmelchior%40vmware.com%7C458c4abf934b43480f2308d8248b403a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637299527784977071sdata=7CRcXQYAkbVtQ5CMFZgKZCMtfyqHw2UxkNPA4KwSl8k%3Dreserved=0



RE: [PROPOSAL] backport fix for GEODE-8020 to support/1.13

2020-07-09 Thread Alberto Bustamante Reyes
+1

De: Donal Evans 
Enviado: jueves, 9 de julio de 2020 17:50
Para: dev@geode.apache.org 
Asunto: Re: [PROPOSAL] backport fix for GEODE-8020 to support/1.13

+1

From: Bruce Schuchardt 
Sent: Thursday, July 9, 2020 8:00 AM
To: dev@geode.apache.org 
Subject: [PROPOSAL] backport fix for GEODE-8020 to support/1.13

There are reports that SSL performance is off on the support/1.13 branch with 
respect to the support/1.12 branch,  but performance on develop okay.  The only 
communications changes in develop that aren’t in 1.13 are those that fixed this 
long-standing bug, so I’d like to backport it to the 1.13 branch.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F5048data=02%7C01%7Cdoevans%40vmware.com%7C2fb1536164944cdb5c2808d82418dc26%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637299036483136624sdata=u5UoE8C7bJdwc7RohfFCE1T%2FnbD%2FWGF7Cg%2F%2Fk9lIIA4%3Dreserved=0

The error was in the cluster communications message-streamer class that created 
some extra objects during message transmission.  The fix is small and has been, 
at this point, through many test iterations.


about Liberica JDK

2020-07-07 Thread Alberto Bustamante Reyes
Hi devs,

I have seen in develop branch this commit that changes openjdk by Liberica JDK 
( https://github.com/apache/geode/pull/5312 ), although it was reverted later 
so I suppose there are still issues to be solved.

I didn't know Liberica and I'm curious about the change. Why is this change 
being implemented?

BR/

Alberto B.



RE: Fate of master branch

2020-06-26 Thread Alberto Bustamante Reyes
+1 for deleting master branch. An also for updating the wiki page about 
branching that Alberto pointed out.

De: Bruce Schuchardt 
Enviado: viernes, 26 de junio de 2020 17:37
Para: dev@geode.apache.org 
Asunto: Re: Fate of master branch

Let's just delete it.  I need to do that in my own repos as well.

On 6/26/20, 8:05 AM, "Blake Bender"  wrote:

Apologies if this has been addressed already and I missed it.  In keeping 
with other OSS projects, I believe it’s time we did something about removing 
the insensitive term master from Geode repositories.

One choice a lot of projects appear to be going with is a simple rename 
from master • main.  In our own case, however, master isn’t really in use for 
anything vital.  We track releases with a tag and a branch to backport fixes 
to, and the develop branch is the “source of truth” latest-and-greatest version 
of the code.  We could thus simply delete master with no loss I’m aware of.  
Any opinions?

Thanks,

Blake




RE: [INFO] Distributed Test runs in bulk results.

2020-06-11 Thread Alberto Bustamante Reyes
I think a report like this is very useful to have real data about which flaky 
tests fail more often.

It would be great if a report like this were automatically generated and 
updated after each CI execution. In a project I was working before a similar 
report was implemented and it was very useful for developers to check if a test 
case was failing in the past, and also to identify which were the flaky test 
cases that we should try to fix first.

De: Mark Hanson 
Enviado: jueves, 11 de junio de 2020 7:59
Para: dev@geode.apache.org 
Asunto: [INFO] Distributed Test runs in bulk results.

Hello All,

I have been doing bulk test runs of DistributedTestOpenJDK8, in this case over 
200. Here is a simplified report to kind of help you see what I am seeing and I 
think everybody sees with random failures as part of the PR process.

It is very easy to cause failures like this by not knowing what is running 
asynchronous and Geode is a complex system or introducing timing constraints 
that may not hold up in the system e.g. waiting 5 seconds for a test result 
that could take longer unbeknownst you.

All of that said, here are the results. There are tickets already open for most 
if not all of these issues.

Please let me know how often you all would like to see these reports…

Thanks,
Mark


***
Overall build success rate: 84.0%


The following test methods see failures in more than one class.  There may be a 
failing *TestBase class

*.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived: 
 18 failures :
  ParallelWANPersistenceEnabledGatewaySenderDUnitTest:  7 failures (96.889% 
success rate)
  ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  11 failures 
(95.111% success rate)

*.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
  4 failures :
  SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  3 failures 
(98.667% success rate)
  SerialWANPersistenceEnabledGatewaySenderDUnitTest:  1 failures (99.556% 
success rate)

***


org.apache.geode.management.MemberMXBeanDistributedTest:  3 failures (98.667% 
success rate)

 testBucketCount   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3247
 testBucketCount   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3241
 testBucketCount   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3199

org.apache.geode.internal.cache.wan.parallel.ParallelWANPersistenceEnabledGatewaySenderDUnitTest:
  7 failures (96.889% success rate)

 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3335
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3331
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3294
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3285
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3218
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3180
 
testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3156

org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionDistributedTest:
  1 failures (99.556% success rate)

 testCacheCloseDuringBucketMoveDoesntCauseDataLoss   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3267

org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest:  1 failures 
(99.556% success rate)

 testDistributedRegionClientPutRejection   

Native client - PdxType change in "<" operator impacts performance

2020-06-02 Thread Alberto Bustamante Reyes
Hi all,

I have reported a performance problem of the c++ native client, which appeared 
after the modification of the "<" operator of the PdxType class: 
https://issues.apache.org/jira/browse/GEODE-8212

I just want to highlight it, just in case someone is facing the same issue.

BR/

Alberto B.





RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

2020-06-02 Thread Alberto Bustamante Reyes
Hi,

Kind reminder. The extended deadline for the RFC review is next Thursday 4th 
June.

BR/

Alberto B.

De: Udo Kohlmeyer 
Enviado: sábado, 30 de mayo de 2020 2:37
Para: dev@geode.apache.org 
Asunto: RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads 
start

Hi there Alberto,

There is no explicit requirement to receive any “+1” messages.

I think a good rule of thumb is to:
a) To provide a little more time to review any RFC. One week might be a little 
short, given that we cannot assume that everyone has time to review/work on the 
project in a full-time capacity. I always think 2-3 weeks is safe.
b) If no explicit “+1”s are received after 50% of the allotted review time, 
maybe a nudge in the DEV list to review the RFC.

After those steps have been followed, it would be safe to assume that 
“consensus by lack of objection” is reached if the deadline has been reached.

Thank you for extending.

—Udo
On May 29, 2020, 12:02 PM -0700, Alberto Bustamante Reyes 
, wrote:
Hi Udo,

Thanks for your message, I was not sure if I had to receive explicit +1 
messages or not. Of course I prefer to have some feedback before continue so I 
will extend the deadline until end of next Thursday (4th June), I hope its fine.

BR/

Alberto B.

De: Udo Kohlmeyer 
Enviado: viernes, 29 de mayo de 2020 19:30
Para: dev@geode.apache.org 
Asunto: RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads 
start

Hi there Alberto,

Thank you for the RFC.

Tbh, I don’t know if there should some guidance around the period that we 
invite comments on.

I personally had a really busy week and could not get to the RFC review in the 
1 week that I was given.

I would like to request that this RFC is extended by 1 more week, to invite 
comments.
I understand that without comments it is reasonable to assume that everyone 
agrees, but I would prefer that, in this case, we need to get some amount of 
“+1” comments on this RFC.

I fear that we might fall under a false-positive mentality here, if we assume 
that everyone has read the RFC, had time to think and consider its 
repercussions, within the 1 week dead line.

Hope you can accommodate the extra 1 week extension request.

—Udo
On May 29, 2020, 1:56 AM -0700, Alberto Bustamante Reyes 
, wrote:
Hi,

No comments have been received so far. I have moved the RFC to "in development" 
state and I will continue with the code implementation.

BR/

Alberto B.
____
De: Alberto Bustamante Reyes 
Enviado: sábado, 23 de mayo de 2020 0:26
Para: dev@geode.apache.org 
Asunto: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

Hi Geode community,

I have posted on the wiki a new RFC about implementing a new option for serial 
gateway sender creation related with how the dispatcher threads are started. 
This option will be used only when gateway receivers are configured to share 
same host and port. This configuration was already discussed on a previous RFC.

Please send your comments by Thursday 28th May.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FNew%2Boption%2Bfor%2Bserial%2Bgw%2Bsender%2Bdispatcher%2Bthreads%2Bstartdata=02%7C01%7Cudo%40vmware.com%7C286cd3ccd1c544f2e50308d80402cc16%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637263757351787094sdata=ij1HPHVvJQKZMnrfv%2Fj147ULRhyYtDrDX2StQyD3WKM%3Dreserved=0

Thanks,

Alberto B.


RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

2020-05-29 Thread Alberto Bustamante Reyes
Hi Udo,

Thanks for your message, I was not sure if I had to receive explicit +1 
messages or not. Of course I prefer to have some feedback before continue so I 
will extend the deadline until end of next Thursday (4th June), I hope its fine.

BR/

Alberto B.

De: Udo Kohlmeyer 
Enviado: viernes, 29 de mayo de 2020 19:30
Para: dev@geode.apache.org 
Asunto: RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads 
start

Hi there Alberto,

Thank you for the RFC.

Tbh, I don’t know if there should some guidance around the period that we 
invite comments on.

I personally had a really busy week and could not get to the RFC review in the 
1 week that I was given.

I would like to request that this RFC is extended by 1 more week, to invite 
comments.
I understand that without comments it is reasonable to assume that everyone 
agrees, but I would prefer that, in this case, we need to get some amount of 
“+1” comments on this RFC.

I fear that we might fall under a false-positive mentality here, if we assume 
that everyone has read the RFC, had time to think and consider its 
repercussions, within the 1 week dead line.

Hope you can accommodate the extra 1 week extension request.

—Udo
On May 29, 2020, 1:56 AM -0700, Alberto Bustamante Reyes 
, wrote:
Hi,

No comments have been received so far. I have moved the RFC to "in development" 
state and I will continue with the code implementation.

BR/

Alberto B.
____
De: Alberto Bustamante Reyes 
Enviado: sábado, 23 de mayo de 2020 0:26
Para: dev@geode.apache.org 
Asunto: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

Hi Geode community,

I have posted on the wiki a new RFC about implementing a new option for serial 
gateway sender creation related with how the dispatcher threads are started. 
This option will be used only when gateway receivers are configured to share 
same host and port. This configuration was already discussed on a previous RFC.

Please send your comments by Thursday 28th May.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FNew%2Boption%2Bfor%2Bserial%2Bgw%2Bsender%2Bdispatcher%2Bthreads%2Bstartdata=02%7C01%7Cudo%40vmware.com%7Cfed54db06b9d4fce3dd808d803ae255b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637263393774343460sdata=vMECgNzw6IFbbToYpGtrXxrSsjEJ%2FddrDdQv4npWvx8%3Dreserved=0

Thanks,

Alberto B.


RE: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

2020-05-29 Thread Alberto Bustamante Reyes
Hi,

No comments have been received so far. I have moved the RFC to "in development" 
state and I will continue with the code implementation.

BR/

Alberto B.
____
De: Alberto Bustamante Reyes 
Enviado: sábado, 23 de mayo de 2020 0:26
Para: dev@geode.apache.org 
Asunto: [DISCUSS] RFC: New option for serial gw sender dispatcher threads start

Hi Geode community,

I have posted on the wiki a new RFC about implementing a new option for serial 
gateway sender creation related with how the dispatcher threads are started. 
This option will be used only when gateway receivers are configured to share 
same host and port. This configuration was already discussed on a previous RFC.

Please send your comments by Thursday 28th May.

https://cwiki.apache.org/confluence/display/GEODE/New+option+for+serial+gw+sender+dispatcher+threads+start

Thanks,

Alberto B.


[DISCUSS] RFC: New option for serial gw sender dispatcher threads start

2020-05-22 Thread Alberto Bustamante Reyes
Hi Geode community,

I have posted on the wiki a new RFC about implementing a new option for serial 
gateway sender creation related with how the dispatcher threads are started. 
This option will be used only when gateway receivers are configured to share 
same host and port. This configuration was already discussed on a previous RFC.

Please send your comments by Thursday 28th May.

https://cwiki.apache.org/confluence/display/GEODE/New+option+for+serial+gw+sender+dispatcher+threads+start

Thanks,

Alberto B.


RE: Question about version checks inside fromData method in GatewaySenderEventImpl

2020-05-19 Thread Alberto Bustamante Reyes
Hi Juan Jose,

I think Alberto is asking about how the check is done, not about why its done. 
The method he is asking about is mixing the two ways we know for handling 
backward compatibility.
One is creating the "toDataPre_GEODE_X_X_X" and "fromDataPre_GEODE_X_X_X" 
methods. And the other one is using ifs to check the versions. But this method 
is mixing both of them.

BR/

Alberto B.

De: Ju@N 
Enviado: martes, 19 de mayo de 2020 14:54
Para: dev@geode.apache.org 
Asunto: Re: Question about version checks inside fromData method in 
GatewaySenderEventImpl

Hello Alberto,

It looks like the property *isConcurrencyConflict* was added as part of
*GEODE-3967* [1] and this was released as part of Geode 1.9.0; that seems
to the reason why the check is in place: if we get an instance of
*GatewaySenderEventImpl* from a member running a version higher than 1.9.0
then we are 100% sure that the serialized form will contain the new field
so we can parse it, if the serialized *GatewaySenderEventImpl *comes from
an older member the filed won't be there so we don't even try to parse it.
Hope I didn't miss anything.
Cheers.

[1]: https://issues.apache.org/jira/browse/GEODE-3967

On Tue, 19 May 2020 at 13:14, Alberto Gomez  wrote:

> Hi,
>
> Looking at the fromData method of GatewaySenderEventImpl I see that it
> contains a conditional reading of the isConcurrencyConflict when version is
> greater than Geode 1.9.0 one. See below:
>
>   @Override
>   public void fromData(DataInput in,
>   DeserializationContext context) throws IOException,
> ClassNotFoundException {
> fromDataPre_GEODE_1_9_0_0(in, context);
> if (version >= Version.GEODE_1_9_0.ordinal()) {
>   this.isConcurrencyConflict = DataSerializer.readBoolean(in);
> }
>   }
>
> I have looked at the implementation of this method in other classes and
> have not seen this checking of version pattern. I have also observed that
> if the "if" is removed some backward compatibility tests fail.
>
> Could anybody tell me why this check (the if) is necessary given that
> there is already a fromDataPre_GEODE_1_9_0 method in the class?
>
> Thanks in advance,
>
> -Alberto G.
>


--
Ju@N


RE: About "change loglevel" command

2020-04-20 Thread Alberto Bustamante Reyes
Hi Kirk,

Thanks for the info, we were not aware of it. We are using filters in our 
configuration so we are not taking advantage of the optimization you mentioned. 
I think this is something that should be included in the documentation.
I have created a PR: https://github.com/apache/geode/pull/4975(as its a small 
change I have not created a ticket but let me know if it is necessary to do so)

Anyway, for me its strange to see the difference between how it is decided to 
configure FastLogger to work on that way and how it is decided if the "change 
loglevel" command can be executed or not (both decisions are made in 
Log4jLoggingProvider ).

On one hand, FastLogger is delegating the "is-Enabled" methods execution if the 
log level is debug or above or if there are filters in the log configuration.
And on the other hand, "change loglevel" command is doing nothing when "the 
logging configuration is not the default" which is decided depending on the 
value of the geode-default property.

I think this could be improved changing the requirement of the geode-default 
property for other more specific check/s, as it is done with FastLogger. The 
thing is that "change loglevel" command works fine for us if we force it, so I 
think that at least the way to force it should be in the documentation.

If you have more info about what should be checked in the configuration instead 
of the geode-default property, I could implement it.

BR/

Alberto




De: Kirk Lund 
Enviado: miércoles, 8 de abril de 2020 18:27
Para: geode 
Asunto: Re: About "change loglevel" command

This behavior has always worked like this since the internal implementation
of logging changed from GemFire LogWriters to using Log4j. The reason is
for performance. Geode is optimized for log level INFO with no filters --
it does this by wrapping all Log4j Loggers in a class called FastLogger
which prevents Log4j from performing a bunch of filter checks as well as
checking the timestamp of the log4j2.xml file. For some reason, the Log4j
devs decided to have the check for changes to the configuration file occur
within the thread performing a logging statement -- when this is allowed to
happen, it kills the performance of Geode. So FastLogger has a volatile
boolean that short circuits all of this extra checking, but it can only do
this if the code can be sure that there are no filters and that the log
level is INFO (or WARN or ERROR). Geode knows that it can bypass all of
that extra Log4j behavior only if it's the default log4j2.xml that is
bundled inside of the Geode jar (geode-core in older releases and now moved
to geode-log4j).

On Wed, Apr 8, 2020 at 8:40 AM Alberto Bustamante Reyes
 wrote:

> Thanks Kirk. Could I ask what was the reason behind this change? In older
> Geode version (1.10 I think) we were using our own log4j files, and the
> command was working fine.
>
> Digging into the code I saw that its possible to start servers with a
> system property (--J=-Dgeode.LOG_LEVEL_UPDATE_OCCURS=ALWAYS) that also
> allows the "change loglevel" command to work. I think that could be a
> better alternative to document, as it is more command specific, instead of
> documenting the "geode-default" property.
>
> For example (in italics the part that could be added to the documentation
> of the command):
>
> "Changes the logging level on specified members. This command only will
> take effect if the default Geode logging configuration is used.
>
> In case of using custom log4j configuration files, this command will not
> work unless the member whose logging level you want to change was started
> using the '--J=-Dgeode.LOG_LEVEL_UPDATE_OCCURS=ALWAYS' system property."
>
> BR/
>
> Alberto B.
>
>
>
>
> 
> De: Kirk Lund 
> Enviado: martes, 7 de abril de 2020 0:17
> Para: geode 
> Asunto: Re: About "change loglevel" command
>
> Yes, this behavior is correct. If the User provides their own logging
> configuration (or a different logging impl such as logback) then none of
> the log-* configuration properties in Geode have any effect.
>
> On Mon, Apr 6, 2020 at 9:26 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi all,
> >
> > I have observed that "change loglevel" command doesn't work if the
> > "log4j2.xml" file used doesn't contain the "geode-default" property set
> to
> > true. This requirement is not documented [1], so I would like to confirm
> if
> > this is the correct behavior.
> >
> > If we add "geode-default=true" in our log4j2 files, the "change loglevel"
> > works fine, but Im not sure if its ok to use that property on a custom
> log
> > config file.
> >
> > Thanks,
> >
> >
> > Alberto B.
> >
> > [1]
> >
> https://geode.apache.org/docs/guide/112/tools_modules/gfsh/command-pages/change.html
> >
> >
>


Re: Website refresh (was Re: [DISCUSS] Adding Google Analytics to our website)

2020-04-09 Thread Alberto Bustamante Reyes
It would be great if the new webpage could include a search button for the user 
guide.

BR/

Alberto B.


RE: About "change loglevel" command

2020-04-08 Thread Alberto Bustamante Reyes
Thanks Kirk. Could I ask what was the reason behind this change? In older Geode 
version (1.10 I think) we were using our own log4j files, and the command was 
working fine.

Digging into the code I saw that its possible to start servers with a system 
property (--J=-Dgeode.LOG_LEVEL_UPDATE_OCCURS=ALWAYS) that also allows the 
"change loglevel" command to work. I think that could be a better alternative 
to document, as it is more command specific, instead of documenting the 
"geode-default" property.

For example (in italics the part that could be added to the documentation of 
the command):

"Changes the logging level on specified members. This command only will take 
effect if the default Geode logging configuration is used.

In case of using custom log4j configuration files, this command will not work 
unless the member whose logging level you want to change was started using the 
'--J=-Dgeode.LOG_LEVEL_UPDATE_OCCURS=ALWAYS' system property."

BR/

Alberto B.





De: Kirk Lund 
Enviado: martes, 7 de abril de 2020 0:17
Para: geode 
Asunto: Re: About "change loglevel" command

Yes, this behavior is correct. If the User provides their own logging
configuration (or a different logging impl such as logback) then none of
the log-* configuration properties in Geode have any effect.

On Mon, Apr 6, 2020 at 9:26 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> I have observed that "change loglevel" command doesn't work if the
> "log4j2.xml" file used doesn't contain the "geode-default" property set to
> true. This requirement is not documented [1], so I would like to confirm if
> this is the correct behavior.
>
> If we add "geode-default=true" in our log4j2 files, the "change loglevel"
> works fine, but Im not sure if its ok to use that property on a custom log
> config file.
>
> Thanks,
>
>
> Alberto B.
>
> [1]
> https://geode.apache.org/docs/guide/112/tools_modules/gfsh/command-pages/change.html
>
>


About "change loglevel" command

2020-04-06 Thread Alberto Bustamante Reyes
Hi all,

I have observed that "change loglevel" command doesn't work if the "log4j2.xml" 
file used doesn't contain the "geode-default" property set to true. This 
requirement is not documented [1], so I would like to confirm if this is the 
correct behavior.

If we add "geode-default=true" in our log4j2 files, the "change loglevel" works 
fine, but Im not sure if its ok to use that property on a custom log config 
file.

Thanks,


Alberto B.

[1] 
https://geode.apache.org/docs/guide/112/tools_modules/gfsh/command-pages/change.html



RE: WAN replication issue in cloud native environments

2020-04-01 Thread Alberto Bustamante Reyes
Hi Dan,

I have realized that after this change, if you want to do a quick test in your 
laptop, it will be not possible to run two servers properly. There could be 
different scenarios you could not test. For example, you could not test what 
happens when a server is restarted, as both will be considered down.

So I think it would be better to use ip+port+id, it will have less impacts.

BR/

Alberto B.

De: Dan Smith 
Enviado: viernes, 27 de marzo de 2020 19:14
Para: dev@geode.apache.org 
Asunto: Re: WAN replication issue in cloud native environments

With this PR, it would be possible to identify servers running with the
> same ip and port, because now they will be identified by member id. But
> Bruce realized that it could be a problem if two servers are running in the
> same JVM, as they will share the same member id. It seems its very unlikely
> that people are doing it, but its not explicitly prohibited.
>

What is going to happen if a user does set things up this way? The things I
can think of are:

1. When a connection to one of the cache server fails, the client will
close all of the connections to both. But this doesn't seem like a bad
outcome, since it's likely the whole server crashed anyway.
2. Pings might not reach the correct server - but it looks like we have a
single ClientHealthMonitor for the server process anyway? So I think the
pings are getting to the right place.

If there aren't any other negative outcomes, I think it's ok to proceed
with the current solution. But I'd also be ok going to ip+port+id.

I also agree that this use case of a single pool connecting to multiple
cache servers in the same process doesn't make much sense.

-Dan


RE: WAN replication issue in cloud native environments

2020-03-27 Thread Alberto Bustamante Reyes
Hi,

We need some advice from the dev list. We have faced a problem with the PR of 
this RFC (https://github.com/apache/geode/pull/4824 ), and we would like to 
hear your opinion about it.

With this PR, it would be possible to identify servers running with the same ip 
and port, because now they will be identified by member id. But Bruce realized 
that it could be a problem if two servers are running in the same JVM, as they 
will share the same member id. It seems its very unlikely that people are doing 
it, but its not explicitly prohibited.

Should we take this setup as something to be allowed so the code has to be 
adapted?(1) Or should we take for granted that no one is using it so we can 
keep this solution?

Thanks!


(1) We already have a version of the code in which the servers were identified 
by "ip+port+id" that will cover this case (this was the original solution but 
was changed after comments on a previous PR)
____
De: Alberto Bustamante Reyes 
Enviado: jueves, 26 de marzo de 2020 20:17
Para: dev@geode.apache.org 
Asunto: RE: WAN replication issue in cloud native environments

Ok, I have moved the RFC then. Thanks again for your time & help!

De: Dan Smith 
Enviado: jueves, 26 de marzo de 2020 18:54
Para: dev@geode.apache.org 
Asunto: Re: WAN replication issue in cloud native environments

+1

After talking through this with Bruce a bit, I think the changes you are
proposing to LocatorLoadSnapshot and EndPointManager manager make sense.
For the ping issue, I like the proposed solution to forward the ping to the
correct server. Sounds good!

-Dan

On Thu, Mar 26, 2020 at 10:47 AM Bruce Schuchardt 
wrote:

> +1
>
> I think this could move to the "In Development" state
>
>
>
> From: Alberto Bustamante Reyes 
> Date: Wednesday, March 25, 2020 at 4:13 PM
> To: Bruce Schuchardt , Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" 
> Cc: Jacob Barrett , Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black 
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Hi,
>
>
>
> I have modified the RFC to include the alternative suggested by Bruce. Im
> also extending the deadline for sending comments to next Friday 27th March
> EOB.
>
>
>
> Thanks!
>
>
>
> BR/
>
>
>
> Alberto B.
>
> De: Bruce Schuchardt 
> Enviado: lunes, 23 de marzo de 2020 22:38
> Para: Alberto Bustamante Reyes ; Dan
> Smith ; dev@geode.apache.org 
> Cc: Jacob Barrett ; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black 
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> I think what Dan did was pass in a socket factory that would connect to
> his gateway instead of the requested server.  Doing it like that would
> require a lot less code change than what you’re currently doing and would
> get past the unit test problem.
>
>
>
> I can point you to where you’d need to make changes for the Ping
> operatio:.  PingOpImpl would need to send the ServerLocation it’s trying to
> reach.  PingOp.execute() gets that as a parameter and
> PingOpImpl.sendMessage() writes it to the server.  The Ping command class’s
> cmdExecute would need to read that data if
> serverConnection.getClientVersion() is Version.GEODE_1_13_0 or later.  Then
> it would have to compare the server location it read to that server’s
> coordinates and, if not equal, find the server with those coordinates and
> send a new DistributionMessage to it with the client’s identity.  There are
> plenty of DistributionMessage classes around to look at as precedents.  You
> send the message with
> serverConnection.getCache().getDistributionManager().putOutgoing(message).
>
>
>
> You can PM me any time.  Dan could answer questions about his gateway work.
>
>
>
>
>
> From: Alberto Bustamante Reyes 
> Date: Monday, March 23, 2020 at 2:18 PM
> To: Bruce Schuchardt , Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" 
> Cc: Jacob Barrett , Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black 
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Thanks for your answer and your comment in the wiki Bruce. I will take a
> closer look at what you mentioned, it is not clear enough for me how to
> implement it.
>
>
>
> BTW, I forgot to set a deadline for the wiki review, I hope that Thursday
> 26th March is enough to receive comments.
>
> De: Bruce Schuchardt 
> Enviado: jueves, 19 de marzo de 2020 16:30
> Para: Alberto Bustamante Reyes ; Dan
> Smith ; dev@geode.apache.org 
> Cc: Jacob Barrett ; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black 
> Asunt

RE: WAN replication issue in cloud native environments

2020-03-26 Thread Alberto Bustamante Reyes
Ok, I have moved the RFC then. Thanks again for your time & help!

De: Dan Smith 
Enviado: jueves, 26 de marzo de 2020 18:54
Para: dev@geode.apache.org 
Asunto: Re: WAN replication issue in cloud native environments

+1

After talking through this with Bruce a bit, I think the changes you are
proposing to LocatorLoadSnapshot and EndPointManager manager make sense.
For the ping issue, I like the proposed solution to forward the ping to the
correct server. Sounds good!

-Dan

On Thu, Mar 26, 2020 at 10:47 AM Bruce Schuchardt 
wrote:

> +1
>
> I think this could move to the "In Development" state
>
>
>
> From: Alberto Bustamante Reyes 
> Date: Wednesday, March 25, 2020 at 4:13 PM
> To: Bruce Schuchardt , Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" 
> Cc: Jacob Barrett , Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black 
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Hi,
>
>
>
> I have modified the RFC to include the alternative suggested by Bruce. Im
> also extending the deadline for sending comments to next Friday 27th March
> EOB.
>
>
>
> Thanks!
>
>
>
> BR/
>
>
>
> Alberto B.
>
> De: Bruce Schuchardt 
> Enviado: lunes, 23 de marzo de 2020 22:38
> Para: Alberto Bustamante Reyes ; Dan
> Smith ; dev@geode.apache.org 
> Cc: Jacob Barrett ; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black 
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> I think what Dan did was pass in a socket factory that would connect to
> his gateway instead of the requested server.  Doing it like that would
> require a lot less code change than what you’re currently doing and would
> get past the unit test problem.
>
>
>
> I can point you to where you’d need to make changes for the Ping
> operatio:.  PingOpImpl would need to send the ServerLocation it’s trying to
> reach.  PingOp.execute() gets that as a parameter and
> PingOpImpl.sendMessage() writes it to the server.  The Ping command class’s
> cmdExecute would need to read that data if
> serverConnection.getClientVersion() is Version.GEODE_1_13_0 or later.  Then
> it would have to compare the server location it read to that server’s
> coordinates and, if not equal, find the server with those coordinates and
> send a new DistributionMessage to it with the client’s identity.  There are
> plenty of DistributionMessage classes around to look at as precedents.  You
> send the message with
> serverConnection.getCache().getDistributionManager().putOutgoing(message).
>
>
>
> You can PM me any time.  Dan could answer questions about his gateway work.
>
>
>
>
>
> From: Alberto Bustamante Reyes 
> Date: Monday, March 23, 2020 at 2:18 PM
> To: Bruce Schuchardt , Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" 
> Cc: Jacob Barrett , Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black 
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Thanks for your answer and your comment in the wiki Bruce. I will take a
> closer look at what you mentioned, it is not clear enough for me how to
> implement it.
>
>
>
> BTW, I forgot to set a deadline for the wiki review, I hope that Thursday
> 26th March is enough to receive comments.
>
> De: Bruce Schuchardt 
> Enviado: jueves, 19 de marzo de 2020 16:30
> Para: Alberto Bustamante Reyes ; Dan
> Smith ; dev@geode.apache.org 
> Cc: Jacob Barrett ; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black 
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> I wonder if an approach similar to the SNI hostname PoolFactory changes
> would work for this non-TLS gateway.  The client needs to differentiate
> between the different servers so that it doesn’t declare all of them dead
> should one of them fail.  If the pool knew about the gateway it could
> direct all traffic there and the servers wouldn’t need to set a
> hostname-for-clients.
>
>
>
> It’s not an ideal solution since the gateway wouldn’t know which server
> the client wanted to contact and there are sure to be other problems like
> creating a backup queue for subscriptions.  But that’s the case with the
> hostname-for-clients approach, too.
>
>
>
>
>
> From: Alberto Bustamante Reyes 
> Date: Wednesday, March 18, 2020 at 8:35 AM
> To: Dan Smith , "dev@geode.apache.org" <
> dev@geode.apache.org>
> Cc: Bruce Schuchardt , Jacob Barrett <
> jbarr...@pivotal.io>, Anilkumar Gingade , Charlie
> Black 
> Subject: RE: WAN replication issue in cloud native environments
>
>

RE: WAN replication issue in cloud native environments

2020-03-25 Thread Alberto Bustamante Reyes
Hi,

I have modified the RFC to include the alternative suggested by Bruce. Im also 
extending the deadline for sending comments to next Friday 27th March EOB.

Thanks!

BR/

Alberto B.

De: Bruce Schuchardt 
Enviado: lunes, 23 de marzo de 2020 22:38
Para: Alberto Bustamante Reyes ; Dan Smith 
; dev@geode.apache.org 
Cc: Jacob Barrett ; Anilkumar Gingade 
; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments


I think what Dan did was pass in a socket factory that would connect to his 
gateway instead of the requested server.  Doing it like that would require a 
lot less code change than what you’re currently doing and would get past the 
unit test problem.



I can point you to where you’d need to make changes for the Ping operatio:.  
PingOpImpl would need to send the ServerLocation it’s trying to reach.  
PingOp.execute() gets that as a parameter and PingOpImpl.sendMessage() writes 
it to the server.  The Ping command class’s cmdExecute would need to read that 
data if serverConnection.getClientVersion() is Version.GEODE_1_13_0 or later.  
Then it would have to compare the server location it read to that server’s 
coordinates and, if not equal, find the server with those coordinates and send 
a new DistributionMessage to it with the client’s identity.  There are plenty 
of DistributionMessage classes around to look at as precedents.  You send the 
message with 
serverConnection.getCache().getDistributionManager().putOutgoing(message).



You can PM me any time.  Dan could answer questions about his gateway work.





From: Alberto Bustamante Reyes 
Date: Monday, March 23, 2020 at 2:18 PM
To: Bruce Schuchardt , Dan Smith , 
"dev@geode.apache.org" 
Cc: Jacob Barrett , Anilkumar Gingade 
, Charlie Black 
Subject: RE: WAN replication issue in cloud native environments



Thanks for your answer and your comment in the wiki Bruce. I will take a closer 
look at what you mentioned, it is not clear enough for me how to implement it.



BTW, I forgot to set a deadline for the wiki review, I hope that Thursday 26th 
March is enough to receive comments.



De: Bruce Schuchardt 
Enviado: jueves, 19 de marzo de 2020 16:30
Para: Alberto Bustamante Reyes ; Dan Smith 
; dev@geode.apache.org 
Cc: Jacob Barrett ; Anilkumar Gingade 
; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments



I wonder if an approach similar to the SNI hostname PoolFactory changes would 
work for this non-TLS gateway.  The client needs to differentiate between the 
different servers so that it doesn’t declare all of them dead should one of 
them fail.  If the pool knew about the gateway it could direct all traffic 
there and the servers wouldn’t need to set a hostname-for-clients.



It’s not an ideal solution since the gateway wouldn’t know which server the 
client wanted to contact and there are sure to be other problems like creating 
a backup queue for subscriptions.  But that’s the case with the 
hostname-for-clients approach, too.





From: Alberto Bustamante Reyes 
Date: Wednesday, March 18, 2020 at 8:35 AM
To: Dan Smith , "dev@geode.apache.org" 
Cc: Bruce Schuchardt , Jacob Barrett 
, Anilkumar Gingade , Charlie Black 

Subject: RE: WAN replication issue in cloud native environments



Hi all,



As Bruce suggested me, I have created a wiki page describing the problem we are 
trying to solve: 
https://cwiki.apache.org/confluence/display/GEODE/Allow+same+host+and+port+for+all+gateway+receivers<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_GEODE_Allow-2Bsame-2Bhost-2Band-2Bport-2Bfor-2Ball-2Bgateway-2Breceivers=DwMGaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk=BsmEMvbnhm5KC1W0HFuniJEJ4fc3l7UIrD_-77Kf46I=PIqmcXMmhziM0T3qJYgfxCcBk4EsoZ7aZpwubPfDuko=>



Please let me know if further clarifications are needed.



Also, I have closed the PR I have been using until now, and created a new one 
with the current status of the solution, with one commit per issue described in 
the wiki: https://github.com/apache/geode/pull/4824



Thanks in advance!

________

De: Alberto Bustamante Reyes 
Enviado: lunes, 9 de marzo de 2020 11:24
Para: Dan Smith 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: RE: WAN replication issue in cloud native environments



Thanks for point that out Dan. Sorry for the misunderstanding, as I only found 
that "affinity" (setServerAffinityLocation method) on the client code I thought 
you were talking about it.
Anyway, I did some more tests and it does not solve our problem...

I tried configuring the service affinity on k8s, but it breaks the first part 
of the solution (the changes implemented on LocatorLoadSnapshot that solves the 
problem of the replication) and senders do not connect to other receive

RE: Spinning Geode locators using docker compose

2020-03-24 Thread Alberto Bustamante Reyes
Hi,

Some months ago I was using docker compose for starting a cluster with one 
locator and two servers for testing purposes.
You can check it here, I hope it helps: 
https://github.com/alb3rtobr/geode-docker

BR/

Alberto B.

De: vas aj 
Enviado: martes, 24 de marzo de 2020 0:31
Para: u...@geode.apache.org ; dev@geode.apache.org 

Asunto: Spinning Geode locators using docker compose

Hi team,

I am trying to set up a Geode cluster with one locator for WRITE and another 
locator for READ using docker-compose.

The following docker command is the only working model I have found out so far :

docker run --rm -it --network my-docker-network --hostname 
my.hostname.net -p 10550:10334 -p 40404:40404 
apachegeode/geode:1.11.0

gfsh > start locator --name=locator1 
--hostname-for-clients=my.hostname.net
gfsh > start server --name=server1 
--locators=my.hostname.net[10334] --server-port=40404
gfsh > create region --name=my-region --type=PARTITION_PERSISTENT

Below is the cache.xml to connect to geode locator @ 
my.hostname.net



http://my.hostname.net>" port="10550"/>




The problem I face is if I don't expose server port 40404 or I expose server 
port as 50505, I fail to connect to geode locator @ 
my.hostname.net.

In docker-compose.yml, I cannot expose 2 containers on the same port # 40404.
How can I spin 2 locators using docker-compose so that I can connect to WRITE 
locator @ 10550 and READ locator @ 10551?

Kindly help.

Thanks,
Aj


RE: WAN replication issue in cloud native environments

2020-03-23 Thread Alberto Bustamante Reyes
Thanks for your answer and your comment in the wiki Bruce. I will take a closer 
look at what you mentioned, it is not clear enough for me how to implement it.

BTW, I forgot to set a deadline for the wiki review, I hope that Thursday 26th 
March is enough to receive comments.

De: Bruce Schuchardt 
Enviado: jueves, 19 de marzo de 2020 16:30
Para: Alberto Bustamante Reyes ; Dan Smith 
; dev@geode.apache.org 
Cc: Jacob Barrett ; Anilkumar Gingade 
; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments


I wonder if an approach similar to the SNI hostname PoolFactory changes would 
work for this non-TLS gateway.  The client needs to differentiate between the 
different servers so that it doesn’t declare all of them dead should one of 
them fail.  If the pool knew about the gateway it could direct all traffic 
there and the servers wouldn’t need to set a hostname-for-clients.



It’s not an ideal solution since the gateway wouldn’t know which server the 
client wanted to contact and there are sure to be other problems like creating 
a backup queue for subscriptions.  But that’s the case with the 
hostname-for-clients approach, too.





From: Alberto Bustamante Reyes 
Date: Wednesday, March 18, 2020 at 8:35 AM
To: Dan Smith , "dev@geode.apache.org" 
Cc: Bruce Schuchardt , Jacob Barrett 
, Anilkumar Gingade , Charlie Black 

Subject: RE: WAN replication issue in cloud native environments



Hi all,



As Bruce suggested me, I have created a wiki page describing the problem we are 
trying to solve: 
https://cwiki.apache.org/confluence/display/GEODE/Allow+same+host+and+port+for+all+gateway+receivers<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_GEODE_Allow-2Bsame-2Bhost-2Band-2Bport-2Bfor-2Ball-2Bgateway-2Breceivers=DwMGaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk=BsmEMvbnhm5KC1W0HFuniJEJ4fc3l7UIrD_-77Kf46I=PIqmcXMmhziM0T3qJYgfxCcBk4EsoZ7aZpwubPfDuko=>



Please let me know if further clarifications are needed.



Also, I have closed the PR I have been using until now, and created a new one 
with the current status of the solution, with one commit per issue described in 
the wiki: https://github.com/apache/geode/pull/4824



Thanks in advance!

________

De: Alberto Bustamante Reyes 
Enviado: lunes, 9 de marzo de 2020 11:24
Para: Dan Smith 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: RE: WAN replication issue in cloud native environments



Thanks for point that out Dan. Sorry for the misunderstanding, as I only found 
that "affinity" (setServerAffinityLocation method) on the client code I thought 
you were talking about it.
Anyway, I did some more tests and it does not solve our problem...

I tried configuring the service affinity on k8s, but it breaks the first part 
of the solution (the changes implemented on LocatorLoadSnapshot that solves the 
problem of the replication) and senders do not connect to other receivers when 
the one they were connected to is down.

The only alternative we have in mind to try to solve the ping problem is to 
keep on investigating if changing the ping task creation could be a solution 
(the changes implemented are clearly breaking something, so the solution is not 
complete yet).







De: Dan Smith 
Enviado: jueves, 5 de marzo de 2020 21:03
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

I think there is some confusion here.

The client side class ExecutablePool has a method called 
setServerAffinityLocation. It looks like that is used for some internal 
transaction code to make sure transactions go to the same server. I don't think 
it makes any sense for the gateway to be messing with this setting.

What I was talking about was session affinity in your proxy server. For 
example, if you are using k8s, session affinity as defined in this page - 
https://kubernetes.io/docs/concepts/services-networking/service/<https://urldefense.proofpoint.com/v2/url?u=https-3A__kubernetes.io_docs_concepts_services-2Dnetworking_service_=DwMGaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=JEKigqAv3f2lWHmA02pq9MDT5naXLkEStB4d4n0NQmk=BsmEMvbnhm5KC1W0HFuniJEJ4fc3l7UIrD_-77Kf46I=iF8SOe47Z1OSmk-Ol6B8uSOj9pU33u4cWiH-RfJciXA=>

"If you want to make sure that connections from a particular client are passed 
to the same Pod each time, you can select the session affinity based on the 
client’s IP addresses by setting service.spec.sessionAffinity to “ClientIP” 
(the default is “None”)"

I think setting session affinity might help your use case, because it sounds 
like you are having issues with the proxy directing pings to a different server 
than the d

RE: [VOTE] Using Github issues and wiki for geode-kafka-connector project

2020-03-23 Thread Alberto Bustamante Reyes
+1
It will be easier to contribute to Geode if you just need a github account.

De: Ju@N 
Enviado: lunes, 23 de marzo de 2020 11:13
Para: dev@geode.apache.org 
Asunto: Re: [VOTE] Using Github issues and wiki for geode-kafka-connector 
project

+1

On Sun, 22 Mar 2020 at 16:26, Jacob Barrett  wrote:

>
>
> > On Mar 22, 2020, at 9:23 AM, Anthony Baker  wrote:
> >
> > Check out [1] for a list of projects that are moving to GitHub issues.
> As long as the PMC approves, INFRA will support the switch.
>
>
> Awesome!!
>
> > Once we see how that goes, I’m in favor of having a larger conversation
> about migrating entirely from JIRA to Github.  I’ve been thinking about
> doing this for awhile so thanks for taking the first step Naba!  I think it
> will be a better experience for all and really help all contributors not
> have to deal with multiple systems to get stuff done.
>
> +1
>
>
> -Jake
>
>

--
Ju@N


RE: [PROPOSAL]: Include GEODE-7832, GEODE-7853 & GEODE-7863 in Geode 1.12.0

2020-03-19 Thread Alberto Bustamante Reyes
+1

De: Donal Evans 
Enviado: jueves, 19 de marzo de 2020 2:14
Para: dev@geode.apache.org 
Asunto: Re: [PROPOSAL]: Include GEODE-7832, GEODE-7853 & GEODE-7863 in Geode 
1.12.0

+1

On Wed, Mar 18, 2020 at 4:53 PM Owen Nichols  wrote:

> +3
>
> > On Mar 18, 2020, at 4:52 PM, Ju@N  wrote:
> >
> > Hello devs,
> >
> > I'd like to propose including the fixes for *GEODE-7832 [1]*, *GEODE-7853
> > [2]* and *GEODE-7863 [3]* in release 1.12.0.
> > All the changes are related to the work we have been doing in order to
> > bring the performance closer to the baseline (*Geode 1.10*), we are not
> > quite there yet but it would be good to include these fixes into the
> > release anyways.
> > Best regards.
> >
> > [1]: https://issues.apache.org/jira/browse/GEODE-7832
> > [2]: https://issues.apache.org/jira/browse/GEODE-7853
> > [3]: https://issues.apache.org/jira/browse/GEODE-7863
> >
> > --
> > Ju@N
> > --
> > Ju@N
>
>


RE: WAN replication issue in cloud native environments

2020-03-18 Thread Alberto Bustamante Reyes
Hi all,

As Bruce suggested me, I have created a wiki page describing the problem we are 
trying to solve: 
https://cwiki.apache.org/confluence/display/GEODE/Allow+same+host+and+port+for+all+gateway+receivers

Please let me know if further clarifications are needed.

Also, I have closed the PR I have been using until now, and created a new one 
with the current status of the solution, with one commit per issue described in 
the wiki: https://github.com/apache/geode/pull/4824

Thanks in advance!

De: Alberto Bustamante Reyes 
Enviado: lunes, 9 de marzo de 2020 11:24
Para: Dan Smith 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: RE: WAN replication issue in cloud native environments

Thanks for point that out Dan. Sorry for the misunderstanding, as I only found 
that "affinity" (setServerAffinityLocation method) on the client code I thought 
you were talking about it.
Anyway, I did some more tests and it does not solve our problem...

I tried configuring the service affinity on k8s, but it breaks the first part 
of the solution (the changes implemented on LocatorLoadSnapshot that solves the 
problem of the replication) and senders do not connect to other receivers when 
the one they were connected to is down.

The only alternative we have in mind to try to solve the ping problem is to 
keep on investigating if changing the ping task creation could be a solution 
(the changes implemented are clearly breaking something, so the solution is not 
complete yet).







De: Dan Smith 
Enviado: jueves, 5 de marzo de 2020 21:03
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

I think there is some confusion here.

The client side class ExecutablePool has a method called 
setServerAffinityLocation. It looks like that is used for some internal 
transaction code to make sure transactions go to the same server. I don't think 
it makes any sense for the gateway to be messing with this setting.

What I was talking about was session affinity in your proxy server. For 
example, if you are using k8s, session affinity as defined in this page - 
https://kubernetes.io/docs/concepts/services-networking/service/

"If you want to make sure that connections from a particular client are passed 
to the same Pod each time, you can select the session affinity based on the 
client’s IP addresses by setting service.spec.sessionAffinity to “ClientIP” 
(the default is “None”)"

I think setting session affinity might help your use case, because it sounds 
like you are having issues with the proxy directing pings to a different server 
than the data.

-Dan

On Thu, Mar 5, 2020 at 4:20 AM Alberto Bustamante Reyes 
 wrote:
I think that was what I did when I tried, but I realized I had a failure in the 
code. Now that I have tried again, reverting the change of executing ping by 
endpoint, and applying the server affinity, the connections are much more 
stable! Looks promising 

I suppose that if I want to introduce this change, setting the server affinity 
in the gateway sender should be introduced as a new option in the sender 
configuration, right?

De: Dan Smith mailto:dsm...@pivotal.io>>
Enviado: jueves, 5 de marzo de 2020 4:41
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>; Bruce Schuchardt 
mailto:bschucha...@pivotal.io>>; Jacob Barrett 
mailto:jbarr...@pivotal.io>>; Anilkumar Gingade 
mailto:aging...@pivotal.io>>; Charlie Black 
mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

Oh, sorry, I meant server affinity with the proxy itself. So that it will 
always route traffic from the same gateway sender to the same gateway receiver. 
Hopefully that would ensure that pings go to the same receiver data is sent to.

-Dan

On Wed, Mar 4, 2020, 1:31 AM Alberto Bustamante Reyes 
 wrote:
I have tried setting the server affinity on the gateway sender's pool in 
AbstractGatewaySender class, when the server location is set, but I dont see 
any difference on the behavior of the connections.

I did not mention that the connections are reset every 5 seconds due to 
"java.io.EOFException: The connection has been reset while reading the header". 
But I dont know yet what is causing it.


De: Dan Smith mailto:dsm...@pivotal.io>>
Enviado: martes, 3 de marzo de 2020 18:07
Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Cc: Bruce Schuchardt mailto:bschucha...@pivotal.io>>; 
Jacob Barrett mailto:jbarr...@pivotal.io>>; Anilkumar 
Gingade mailto:aging...@pivotal.io>>; Ch

RE: WAN replication issue in cloud native environments

2020-03-09 Thread Alberto Bustamante Reyes
Thanks for point that out Dan. Sorry for the misunderstanding, as I only found 
that "affinity" (setServerAffinityLocation method) on the client code I thought 
you were talking about it.
Anyway, I did some more tests and it does not solve our problem...

I tried configuring the service affinity on k8s, but it breaks the first part 
of the solution (the changes implemented on LocatorLoadSnapshot that solves the 
problem of the replication) and senders do not connect to other receivers when 
the one they were connected to is down.

The only alternative we have in mind to try to solve the ping problem is to 
keep on investigating if changing the ping task creation could be a solution 
(the changes implemented are clearly breaking something, so the solution is not 
complete yet).







De: Dan Smith 
Enviado: jueves, 5 de marzo de 2020 21:03
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

I think there is some confusion here.

The client side class ExecutablePool has a method called 
setServerAffinityLocation. It looks like that is used for some internal 
transaction code to make sure transactions go to the same server. I don't think 
it makes any sense for the gateway to be messing with this setting.

What I was talking about was session affinity in your proxy server. For 
example, if you are using k8s, session affinity as defined in this page - 
https://kubernetes.io/docs/concepts/services-networking/service/

"If you want to make sure that connections from a particular client are passed 
to the same Pod each time, you can select the session affinity based on the 
client’s IP addresses by setting service.spec.sessionAffinity to “ClientIP” 
(the default is “None”)"

I think setting session affinity might help your use case, because it sounds 
like you are having issues with the proxy directing pings to a different server 
than the data.

-Dan

On Thu, Mar 5, 2020 at 4:20 AM Alberto Bustamante Reyes 
 wrote:
I think that was what I did when I tried, but I realized I had a failure in the 
code. Now that I have tried again, reverting the change of executing ping by 
endpoint, and applying the server affinity, the connections are much more 
stable! Looks promising 

I suppose that if I want to introduce this change, setting the server affinity 
in the gateway sender should be introduced as a new option in the sender 
configuration, right?

De: Dan Smith mailto:dsm...@pivotal.io>>
Enviado: jueves, 5 de marzo de 2020 4:41
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>; Bruce Schuchardt 
mailto:bschucha...@pivotal.io>>; Jacob Barrett 
mailto:jbarr...@pivotal.io>>; Anilkumar Gingade 
mailto:aging...@pivotal.io>>; Charlie Black 
mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

Oh, sorry, I meant server affinity with the proxy itself. So that it will 
always route traffic from the same gateway sender to the same gateway receiver. 
Hopefully that would ensure that pings go to the same receiver data is sent to.

-Dan

On Wed, Mar 4, 2020, 1:31 AM Alberto Bustamante Reyes 
 wrote:
I have tried setting the server affinity on the gateway sender's pool in 
AbstractGatewaySender class, when the server location is set, but I dont see 
any difference on the behavior of the connections.

I did not mention that the connections are reset every 5 seconds due to 
"java.io.EOFException: The connection has been reset while reading the header". 
But I dont know yet what is causing it.


De: Dan Smith mailto:dsm...@pivotal.io>>
Enviado: martes, 3 de marzo de 2020 18:07
Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Cc: Bruce Schuchardt mailto:bschucha...@pivotal.io>>; 
Jacob Barrett mailto:jbarr...@pivotal.io>>; Anilkumar 
Gingade mailto:aging...@pivotal.io>>; Charlie Black 
mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

> We are currently working on other issue related to this change: gw
senders pings are not reaching the gw receivers, so ClientHealthMonitor
closes the connections. I saw that the ping tasks are created by
ServerLocation, so I have tried to solve the issue by changing it to be
done by Endpoint. This change is not finished yet, as in its current status
it causes the closing of connections from gw servers to gw receivers every
5 seconds.

Are you using session affinity? I think you probably will need to since
pings can go over different connections than the data connection.

-Dan

On Tue, Mar 3, 2020 at 3:44 AM Alberto Bustamante Reyes
 wrote:

> Hi Bruce,
&

RE: WAN replication issue in cloud native environments

2020-03-05 Thread Alberto Bustamante Reyes
I think that was what I did when I tried, but I realized I had a failure in the 
code. Now that I have tried again, reverting the change of executing ping by 
endpoint, and applying the server affinity, the connections are much more 
stable! Looks promising 

I suppose that if I want to introduce this change, setting the server affinity 
in the gateway sender should be introduced as a new option in the sender 
configuration, right?

De: Dan Smith 
Enviado: jueves, 5 de marzo de 2020 4:41
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Bruce Schuchardt 
; Jacob Barrett ; Anilkumar 
Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

Oh, sorry, I meant server affinity with the proxy itself. So that it will 
always route traffic from the same gateway sender to the same gateway receiver. 
Hopefully that would ensure that pings go to the same receiver data is sent to.

-Dan

On Wed, Mar 4, 2020, 1:31 AM Alberto Bustamante Reyes 
 wrote:
I have tried setting the server affinity on the gateway sender's pool in 
AbstractGatewaySender class, when the server location is set, but I dont see 
any difference on the behavior of the connections.

I did not mention that the connections are reset every 5 seconds due to 
"java.io.EOFException: The connection has been reset while reading the header". 
But I dont know yet what is causing it.


De: Dan Smith mailto:dsm...@pivotal.io>>
Enviado: martes, 3 de marzo de 2020 18:07
Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Cc: Bruce Schuchardt mailto:bschucha...@pivotal.io>>; 
Jacob Barrett mailto:jbarr...@pivotal.io>>; Anilkumar 
Gingade mailto:aging...@pivotal.io>>; Charlie Black 
mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

> We are currently working on other issue related to this change: gw
senders pings are not reaching the gw receivers, so ClientHealthMonitor
closes the connections. I saw that the ping tasks are created by
ServerLocation, so I have tried to solve the issue by changing it to be
done by Endpoint. This change is not finished yet, as in its current status
it causes the closing of connections from gw servers to gw receivers every
5 seconds.

Are you using session affinity? I think you probably will need to since
pings can go over different connections than the data connection.

-Dan

On Tue, Mar 3, 2020 at 3:44 AM Alberto Bustamante Reyes
 wrote:

> Hi Bruce,
>
> Thanks for your comments, but we are not planning to use TLS, so Im afraid
> the PR you are working on will not solve this problem.
>
> The origin of this issue is that we would like to be able to configure all
> gw receivers with the same "hostname-for-senders" value. The reason is that
> we will run a multisite Geode cluster, having each site on a different
> cloud environment, so using just one hostname makes configuration much more
> easier.
>
> When we tried to configure the cluster in this way, we experienced an
> issue with the replication. Using the same hostname-for-senders parameter
> causes that different servers have equals ServerLocation objects, so if one
> receiver is down, the others are considered down too. With the change
> suggested by Jacob this problem is solved, and replication works fine.
>
> We are currently working on other issue related to this change: gw senders
> pings are not reaching the gw receivers, so ClientHealthMonitor closes the
> connections. I saw that the ping tasks are created by ServerLocation, so I
> have tried to solve the issue by changing it to be done by Endpoint. This
> change is not finished yet, as in its current status it causes the closing
> of connections from gw servers to gw receivers every 5 seconds.
>
> Why you dont like the idea of using the InternalDistributedMember for
> distinguish server locations? Are you thinking about other alternative? In
> this use case, two different gw receivers will have the same
> ServerLocation, so we need to distinguish them.
>
> BR/
>
> Alberto B.
>
> 
> De: Bruce Schuchardt mailto:bschucha...@pivotal.io>>
> Enviado: lunes, 2 de marzo de 2020 20:20
> Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
> mailto:dev@geode.apache.org>>; Jacob Barrett <
> jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>
> Cc: Anilkumar Gingade mailto:aging...@pivotal.io>>; 
> Charlie Black <
> cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> Asunto: Re: WAN replication issue in cloud native environments
>
> I'm coming to this conversation late and probably am missing a lot of
> context.  Is the point of this to be to direct senders to some common
> gate

RE: WAN replication issue in cloud native environments

2020-03-04 Thread Alberto Bustamante Reyes
I have tried setting the server affinity on the gateway sender's pool in 
AbstractGatewaySender class, when the server location is set, but I dont see 
any difference on the behavior of the connections.

I did not mention that the connections are reset every 5 seconds due to 
"java.io.EOFException: The connection has been reset while reading the header". 
But I dont know yet what is causing it.


De: Dan Smith 
Enviado: martes, 3 de marzo de 2020 18:07
Para: dev@geode.apache.org 
Cc: Bruce Schuchardt ; Jacob Barrett 
; Anilkumar Gingade ; Charlie Black 

Asunto: Re: WAN replication issue in cloud native environments

> We are currently working on other issue related to this change: gw
senders pings are not reaching the gw receivers, so ClientHealthMonitor
closes the connections. I saw that the ping tasks are created by
ServerLocation, so I have tried to solve the issue by changing it to be
done by Endpoint. This change is not finished yet, as in its current status
it causes the closing of connections from gw servers to gw receivers every
5 seconds.

Are you using session affinity? I think you probably will need to since
pings can go over different connections than the data connection.

-Dan

On Tue, Mar 3, 2020 at 3:44 AM Alberto Bustamante Reyes
 wrote:

> Hi Bruce,
>
> Thanks for your comments, but we are not planning to use TLS, so Im afraid
> the PR you are working on will not solve this problem.
>
> The origin of this issue is that we would like to be able to configure all
> gw receivers with the same "hostname-for-senders" value. The reason is that
> we will run a multisite Geode cluster, having each site on a different
> cloud environment, so using just one hostname makes configuration much more
> easier.
>
> When we tried to configure the cluster in this way, we experienced an
> issue with the replication. Using the same hostname-for-senders parameter
> causes that different servers have equals ServerLocation objects, so if one
> receiver is down, the others are considered down too. With the change
> suggested by Jacob this problem is solved, and replication works fine.
>
> We are currently working on other issue related to this change: gw senders
> pings are not reaching the gw receivers, so ClientHealthMonitor closes the
> connections. I saw that the ping tasks are created by ServerLocation, so I
> have tried to solve the issue by changing it to be done by Endpoint. This
> change is not finished yet, as in its current status it causes the closing
> of connections from gw servers to gw receivers every 5 seconds.
>
> Why you dont like the idea of using the InternalDistributedMember for
> distinguish server locations? Are you thinking about other alternative? In
> this use case, two different gw receivers will have the same
> ServerLocation, so we need to distinguish them.
>
> BR/
>
> Alberto B.
>
> 
> De: Bruce Schuchardt 
> Enviado: lunes, 2 de marzo de 2020 20:20
> Para: dev@geode.apache.org ; Jacob Barrett <
> jbarr...@pivotal.io>
> Cc: Anilkumar Gingade ; Charlie Black <
> cbl...@pivotal.io>
> Asunto: Re: WAN replication issue in cloud native environments
>
> I'm coming to this conversation late and probably am missing a lot of
> context.  Is the point of this to be to direct senders to some common
> gateway that all of the gateway receivers are configured to advertise?
> I've been working on a PR to support redirection of connections for
> client/server and gateway communications to a common address and put the
> destination host name in the SNIHostName TLS parameter.  Then you won't
> have to tell servers about the common host name - just tell clients what
> the gateway is and they'll connect to it & tell it what the target host
> name is via the SNIHostName.  However, that only works if SSL is enabled.
>
> PR 4743 is a step toward this approach and changes TcpClient and
> SocketCreator to take an unresolved host address.  After this is merged
> another change will allow folks to set a gateway host/port that will be
> used to form connections and insert the destination hostname into the
> SNIHostName SSLParameter.
>
> I would really like us to avoid including InternalDistributedMembers in
> equality checks for server-locations.  To-date we've only held these
> identifiers in Endpoints and other places for debugging purposes and have
> used ServerLocation to identify servers.
>
> On 1/27/20, 8:56 AM, "Alberto Bustamante Reyes"
>  wrote:
>
> Hi again,
>
> Status update: the simplification of the maps suggested by Jacob made
> useless the new proposed class containing the ServerLocation and the member
> id. With this refactoring, replication is working in the sce

RE: WAN replication issue in cloud native environments

2020-03-03 Thread Alberto Bustamante Reyes
Hi Bruce,

Thanks for your comments, but we are not planning to use TLS, so Im afraid the 
PR you are working on will not solve this problem.

The origin of this issue is that we would like to be able to configure all gw 
receivers with the same "hostname-for-senders" value. The reason is that we 
will run a multisite Geode cluster, having each site on a different cloud 
environment, so using just one hostname makes configuration much more easier.

When we tried to configure the cluster in this way, we experienced an issue 
with the replication. Using the same hostname-for-senders parameter causes that 
different servers have equals ServerLocation objects, so if one receiver is 
down, the others are considered down too. With the change suggested by Jacob 
this problem is solved, and replication works fine.

We are currently working on other issue related to this change: gw senders 
pings are not reaching the gw receivers, so ClientHealthMonitor closes the 
connections. I saw that the ping tasks are created by ServerLocation, so I have 
tried to solve the issue by changing it to be done by Endpoint. This change is 
not finished yet, as in its current status it causes the closing of connections 
from gw servers to gw receivers every 5 seconds.

Why you dont like the idea of using the InternalDistributedMember for 
distinguish server locations? Are you thinking about other alternative? In this 
use case, two different gw receivers will have the same ServerLocation, so we 
need to distinguish them.

BR/

Alberto B.


De: Bruce Schuchardt 
Enviado: lunes, 2 de marzo de 2020 20:20
Para: dev@geode.apache.org ; Jacob Barrett 

Cc: Anilkumar Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

I'm coming to this conversation late and probably am missing a lot of context.  
Is the point of this to be to direct senders to some common gateway that all of 
the gateway receivers are configured to advertise?  I've been working on a PR 
to support redirection of connections for client/server and gateway 
communications to a common address and put the destination host name in the 
SNIHostName TLS parameter.  Then you won't have to tell servers about the 
common host name - just tell clients what the gateway is and they'll connect to 
it & tell it what the target host name is via the SNIHostName.  However, that 
only works if SSL is enabled.

PR 4743 is a step toward this approach and changes TcpClient and SocketCreator 
to take an unresolved host address.  After this is merged another change will 
allow folks to set a gateway host/port that will be used to form connections 
and insert the destination hostname into the SNIHostName SSLParameter.

I would really like us to avoid including InternalDistributedMembers in 
equality checks for server-locations.  To-date we've only held these 
identifiers in Endpoints and other places for debugging purposes and have used 
ServerLocation to identify servers.

On 1/27/20, 8:56 AM, "Alberto Bustamante Reyes" 
 wrote:

Hi again,

Status update: the simplification of the maps suggested by Jacob made 
useless the new proposed class containing the ServerLocation and the member id. 
With this refactoring, replication is working in the scenario we have been 
discussing in this conversation. Thats great, and I think the code can be 
merged into develop if there are no extra comments in the PR.

But this does not mean we can say that Geode is able to work properly when 
using gw receivers with the same ip + port. We have seen that when working with 
this configuration, there is a problem with the pings sent from gw senders 
(that acts as clients) to the gw receivers (servers). The pings are reaching 
just one of the receivers, so the sender-receiver connection is finally closed 
by the ClientHealthMonitor.

Do you have any suggestion about how to handle this issue? My first idea 
was to identify where the connection is created, to check if the sender could 
be aware in some way there are more than one server to which the ping should be 
sent, but Im not sure if it could be possible. Or if the alternative could be 
to change the ClientHealthMonitor to be "clever" enough to not close 
connections in this case. Any comment is welcome 

Thanks,

Alberto B.


De: Jacob Barrett 
Enviado: miércoles, 22 de enero de 2020 19:01
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Anilkumar Gingade 
; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments



On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes 
mailto:alberto.bustamante.re...@est.tech>> 
wrote:

Thanks Naba & Jacob for your comments!



@Naba: I have been implementing a solution as you suggested, and I think it 
would be convenient if the client knows the memberId of the server it is 
connected to.

 

RE: Odg: acceptance test task failing?

2020-02-17 Thread Alberto Bustamante Reyes
Last commits on develop:

9f8a2ff2b43c183b4824dd5ab764ecd2243cb2e1 GEODE-7800: Add Redis PSUBSCRIBE and 
PUNSUBSCRIBE commands (#4705)
1a0d9769e482f49e0c725c0d6adc75d324f88958 GEODE-7727: modify sender thread to 
detect release of connection (#4629)
5c6529a76dfad174be6a29438bf196013952b05b Geode 4263 (#4691)
8c40d5e66d1d8743f4b547de8cf429de8a187801 GEODE-7210: Fix 
RedundancyLevelPart1DUnitTest
5d2b1d1003982de248953230a8429f7ec19de692 GEODE-7791: fix 
MergeLogsDistributedTest (#4697)

The first one that had a failed execution of acceptanceTest was GEODE-7800. In 
the PR it can be seen that acceptance test task started failing between commits 
ba04b6a and 9db6c17 .



De: Robert Houghton 
Enviado: lunes, 17 de febrero de 2020 16:30
Para: Mario Ivanac 
Cc: dev@geode.apache.org 
Asunto: Re: Odg: acceptance test task failing?

I had only looked at develop, not the PR pipeline. More investigation is
required.

On Mon, Feb 17, 2020, 07:05 Mario Ivanac  wrote:

> Hi,
>
> but this commit was merged to develop, on saturday morning, and test are
> falling from friday.
>
> BR,
> Mario
> --
> *Šalje:* Robert Houghton 
> *Poslano:* 17. veljače 2020. 16:02
> *Prima:* dev@geode.apache.org 
> *Predmet:* Re: acceptance test task failing?
>
> The test has been erroring consistently on develop since
>
> https://github.com/apache/geode/commit/1a0d9769e482f49e0c725c0d6adc75d324f88958
>
> On Mon, Feb 17, 2020, 06:39 Alberto Bustamante Reyes
>  wrote:
>
> > Hi,
> >
> > After just changing some typos on a PR, I got errors on
> > geode-connectors:acceptanceTest task, but the tests were working fine
> > before that change. I have seen that the acceptance test task in concur
> has
> > failed since last 14th February (15 executions since that day). and it
> > seems all of them got the same error.
> >
> > Is there any problem with the task or the develop branch?
> >
> > Thanks,
> >
> > Alberto B.
> >
> >
>


acceptance test task failing?

2020-02-17 Thread Alberto Bustamante Reyes
Hi,

After just changing some typos on a PR, I got errors on 
geode-connectors:acceptanceTest task, but the tests were working fine before 
that change. I have seen that the acceptance test task in concur has failed 
since last 14th February (15 executions since that day). and it seems all of 
them got the same error.

Is there any problem with the task or the develop branch?

Thanks,

Alberto B.



RE: WAN replication issue in cloud native environments

2020-01-27 Thread Alberto Bustamante Reyes
Hi again,

Status update: the simplification of the maps suggested by Jacob made useless 
the new proposed class containing the ServerLocation and the member id. With 
this refactoring, replication is working in the scenario we have been 
discussing in this conversation. Thats great, and I think the code can be 
merged into develop if there are no extra comments in the PR.

But this does not mean we can say that Geode is able to work properly when 
using gw receivers with the same ip + port. We have seen that when working with 
this configuration, there is a problem with the pings sent from gw senders 
(that acts as clients) to the gw receivers (servers). The pings are reaching 
just one of the receivers, so the sender-receiver connection is finally closed 
by the ClientHealthMonitor.

Do you have any suggestion about how to handle this issue? My first idea was to 
identify where the connection is created, to check if the sender could be aware 
in some way there are more than one server to which the ping should be sent, 
but Im not sure if it could be possible. Or if the alternative could be to 
change the ClientHealthMonitor to be "clever" enough to not close connections 
in this case. Any comment is welcome 

Thanks,

Alberto B.


De: Jacob Barrett 
Enviado: miércoles, 22 de enero de 2020 19:01
Para: Alberto Bustamante Reyes 
Cc: dev@geode.apache.org ; Anilkumar Gingade 
; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments



On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes 
mailto:alberto.bustamante.re...@est.tech>> 
wrote:

Thanks Naba & Jacob for your comments!



@Naba: I have been implementing a solution as you suggested, and I think it 
would be convenient if the client knows the memberId of the server it is 
connected to.

(current code is here: https://github.com/apache/geode/pull/4616 )

For example, in:

LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation 
currentServer, String group, Set excludedServers)

In this method, client has sent the ServerLocation , but if that object does 
not contain the memberId, I dont see how to guarantee that the replacement that 
will be returned is not the same server the client is currently connected.
Inside that method, this other method is called:


Given that your setup is masquerading multiple members behind the same host and 
port (ServerLocation) it doesn’t matter. When the pool opens a new socket to 
the replacement server it will be to the shared hostname and port and the 
Kubenetes service at that host and port will just pick a backend host. In the 
solution we suggested we preserved that behavior since the k8s service can’t 
determine which backend member to route the connection to based on the member 
id.


LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer, groupServers)

where groupServers is a "Map" object. If 
the keys of that map have the same host and port, they are only different on 
the memberId. But as you dont know it (you just have currentServer which 
contains host and port), you cannot get the correct LoadHolder value, so you 
cannot know if your server is the most loaded.

Again, given your use case the behavior of this method is lost when a new 
connection is establish by the pool through the shared hostname anyway.

@Jacob: I think the solution finally implies that client have to know the 
memberId, I think we could simplify the maps.

The client isn’t keeping these load maps, the locator is, and the locator knows 
all the member ids. The client end only needs to know the host/port 
combination. In your example where the wan replication (a client to the remote 
cluster) connects to the shared host/port service and get randomly routed to 
one of the backend servers in that service.

All of this locator balancing code is unnecessarily in this model where 
something else is choosing the final destination. The goal of our proposed 
changes was to recognize that all we need is to make sure the locator keeps the 
shared ServerLocation alive in its responses to clients by tracking the members 
associated and reducing that set to the set of unit ServerLocations. In your 
case that will always reduce to 1 ServerLocation for N number of members, as 
long as 1 member is still up.

-Jake




RE: cpu quota issue in CI

2020-01-25 Thread Alberto Bustamante Reyes
I wrote my mail after the third try, finally it worked at the fourth one 

De: Jacob Barrett 
Enviado: sábado, 25 de enero de 2020 16:31
Para: dev@geode.apache.org 
Asunto: Re: cpu quota issue in CI

Yeah a bunch of CI jobs failed last night. Push an empty commit to your PR 
branch and that should fire a new round of checks.

> On Jan 25, 2020, at 2:40 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi,
>
> Im facing problems on a PR, some CI tasks fail at “create_instance-OpenJDK11” 
> step, due to a problem of “cpu quota exceed”.
> For example: https://concourse.apachegeode-ci.info/builds/128470
>
> Could someone take a look? Thanks!
>
> Alberto B.


cpu quota issue in CI

2020-01-25 Thread Alberto Bustamante Reyes
Hi,

Im facing problems on a PR, some CI tasks fail at “create_instance-OpenJDK11” 
step, due to a problem of “cpu quota exceed”.
 For example: https://concourse.apachegeode-ci.info/builds/128470

Could someone take a look? Thanks!

Alberto B.


RE: Old geode-benchmark PRs

2020-01-23 Thread Alberto Bustamante Reyes
What about closing the PRs and creating a Jira ticket (or tickets) for the 
review and update of the code?
If someone finds time to spend on benchmarks, at least he/she will find the 
tickets in Jira.



De: Donal Evans 
Enviado: jueves, 23 de enero de 2020 16:45
Para: dev@geode.apache.org 
Asunto: Re: Old geode-benchmark PRs

@Alexander, I haven't looked at them in months and they never received any
formal review on GitHub, so it's hard to know for sure if they're ready to
merge or not, but as Jake said, they probably need some massaging to get
the resource usage just right and minimize variance. If at this point
there's no-one who knows enough about tuning benchmarks with the time to
look at them, then it seems unlikely that they'll get merged any time soon.

On Thu, Jan 23, 2020 at 6:42 AM Alexander Murmann 
wrote:

> Donal, are you still looking at these? If they aren't ready to merge and
> not being worked on, should they be closed?
>
> On Wed, Jan 22, 2020 at 3:32 PM Donal Evans  wrote:
>
> > Two of those PRs are mine, so perhaps I can give a bit of context for
> > people who might look at them. The oldest of the two, "Feature/Add
> PdxType
> > benchmark and additional framework flexibility" was an attempt to
> quantify
> > and maintain the improvement in performance for PdxType creation when
> large
> > numbers of PdxTypes already exist, and to allow the passing of additional
> > system properties to the VMs hosting the servers in order to change the
> log
> > level and prevent the benchmark measuring how long it takes to log
> PdxType
> > creation rather than actual time taken to create new PdxTypes. This PR
> has
> > been open for a very long time, so it's possible that the changes
> regarding
> > passing additional system properties to the VMs are now outdated or
> > unnecessary, but the actual benchmarks themselves still have some value.
> >
> > The second PR, "Added benchmarks for aggregate functions" contains 16 new
> > benchmarks related to aggregate OQL queries, (8 each for Partitioned and
> > Replicated regions), which were added following work in that area by the
> > Commons team. The build is currently marked as failing, but this is due
> to
> > a timeout rather than an actual build failure, as the number of
> benchmarks
> > added increased the total time to build beyond the currently configured
> > timeout. Adding such a large number of additional benchmarks will
> probably
> > also noticeably increase the time it takes benchmarks to run, which bears
> > consideration.
> >
> > I hope this helps shed some light for people who may look over those PRs.
> >
> > On Wed, Jan 22, 2020 at 11:36 AM Dan Smith  wrote:
> >
> > > Hi,
> > >
> > > I noticed we have some old outstanding PRs for the geode-benchmarks
> > > project. Are any of these things we want to merge or should we close
> them
> > > out?
> > >
> > > https://github.com/apache/geode-benchmarks/pulls
> > >
> > > -Dan
> > >
> >
>


RE: WAN replication issue in cloud native environments

2020-01-22 Thread Alberto Bustamante Reyes
Thanks Naba & Jacob for your comments!



@Naba: I have been implementing a solution as you suggested, and I think it 
would be convenient if the client knows the memberId of the server it is 
connected to.

(current code is here: https://github.com/apache/geode/pull/4616 )

For example, in:

LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation 
currentServer, String group, Set excludedServers)

In this method, client has sent the ServerLocation , but if that object does 
not contain the memberId, I dont see how to guarantee that the replacement that 
will be returned is not the same server the client is currently connected.
Inside that method, this other method is called:

LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer, groupServers)

where groupServers is a "Map" object. If 
the keys of that map have the same host and port, they are only different on 
the memberId. But as you dont know it (you just have currentServer which 
contains host and port), you cannot get the correct LoadHolder value, so you 
cannot know if your server is the most loaded.

So I think client needs to know the memberId of the server.

@Jacob: I think the solution finally implies that client have to know the 
memberId, I think we could simplify the maps.


BR/

Alberto B.



De: Jacob Barrett 
Enviado: miércoles, 22 de enero de 2020 7:29
Para: dev@geode.apache.org 
Cc: Anilkumar Gingade ; Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments


> On Jan 21, 2020, at 1:24 PM, Nabarun Nag  wrote:
>
> Suggestion:
> - Instead, can we create a new class that contains the memberID and
> ServerLocation and that new class object is added as a key in the
> connectionMap.

I poked around a bit in this code and the ServerLocation is also in the 
LoadHolder class so we can simplify this even more by just using the member ID 
as the key in all these maps. When we need the ServerLocation we can get that 
from the LoadHolder.

The addServer call comes from a caller that has the CacheServerProfile, which 
has the member ID. The updateLoad caller is a DistributedMessage which has a 
sender member that is the member ID. Lastly, removeServer caller has a 
CacheServerProfile as well we can again get the member ID.

-Jake




RE: WAN replication issue in cloud native environments

2020-01-21 Thread Alberto Bustamante Reyes
Hi,

I have been implementing a possible solution for this issue, and although I 
have not finished yet, I would like to kindly ask for comments.

I created some Helm charts to explain and reproduce the problem, if you are 
interested they are here: 
https://github.com/alb3rtobr/geode-cloudnative-wan-replication

The solution consists on adding to ServerLocation the id of the member hosting 
the server, to allow to differentiate two or more gateway receivers with the 
same ip but that are in different locations. I verified that this change fixes 
the problem.

After that, I have been working on fixing issues with the existing tests. In 
the meanwhile, it will be useful to get some feedback about the solution, 
specially if there are impacts I have not considered yet (maybe they are the 
reason for the failing tests Im currently working on).

The code can be found on this PR: https://github.com/apache/geode/pull/4489

Thanks in advance!

Alberto B.



De: Anilkumar Gingade 
Enviado: viernes, 6 de diciembre de 2019 18:56
Para: geode 
Cc: Charlie Black 
Asunto: Re: WAN replication issue in cloud native environments

Alberto,

Can you please file a JIRA ticket for this. This could come up often as
more and more deployments move to K8s.

-Anil.


On Fri, Dec 6, 2019 at 8:33 AM Sai Boorlagadda 
wrote:

> > if one gw receiver stops, the locator will publish to any remote locator
> that there are no receivers up.
>
> I am not sure if locators proactively update remote locators about change
> in receivers list rather I think the senders figures this out on connection
> issues.
> But I see the problem that local-site locators have only one member in the
> list of receivers that they maintain as all receivers register with a
> single  address.
>
> One idea I had earlier is to statically set receivers list to locators
> (just like remote-locators property) which are exchanged with gw-senders.
> This way we can introduce a boolean flag to turn off wan discovery and use
> the statically configured addresses. This can be also useful for
> remote-locators if they are behind a service.
>
> Sai
>
> On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
>  wrote:
>
> > Thanks Charlie, but the issue is not about connectivity. Summarizing the
> > issue, the problem is that if you have two or more gw receivers that are
> > started with the same value of "hostname-for-senders", "start-port" and
> > "end-port" (being "start-port" and "end-port" equal) parameters, if one
> gw
> > receiver stops, the locator will publish to any remote locator that there
> > are no receivers up.
> >
> > And this use case is likely to happen on cloud-native environments, as
> > described.
> >
> > BR/
> >
> > Alberto B.
> > 
> > De: Charlie Black 
> > Enviado: miércoles, 4 de diciembre de 2019 18:11
> > Para: dev@geode.apache.org 
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> > Alberto,
> >
> > Something else to think about SNI based routing.   I believe Mario might
> be
> > working on adding SNI to Geode - he at least had a proposal that he
> > e-mailed out.
> >
> > Basics are the destination host is in the SNI field and the proxy can
> > inspect and route the request to the right service instance. Plus we
> > have the option to not terminate the SSL at the proxy.
> >
> > Full disclosure - I haven't tried out SNI based routing myself and it is
> > something that I thought could work as I was reading about it.   From the
> > whiteboard I have done I think this will do ingress and egress just fine.
> > Potentially easier then port mapping and `hostname for clients` playing
> > around.
> >
> > Just something to think about.
> >
> > Charlie
> >
> >
> > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
> >  wrote:
> >
> > > Hi Jacob,
> > >
> > > Yes,we are using LoadBalancer service type. But note the problem is not
> > > the transport layer but on Geode as GW senders are complaining
> > > “sender-2-parallel : Could not connect due to: There are no active
> > > servers.” when one of the servers in the receiving cluster is killed.
> > >
> > > So, there is still one server alive in the receiving cluster but GW
> > sender
> > > does not know it and the locator is not able to inform about its
> > existence.
> > > Looking at the code it seems internal data structures (maps) holding
> the
> > > profiles use object whose equality check relies only on hostname and
> >

RE: GW sender dispatcher threads & order policy

2020-01-15 Thread Alberto Bustamante Reyes
Hi Mario,

My code contains that fix, its not the same issue. GEODE-7561 solves the issue 
with the value "1" for dispatcher threads, but an explicit value for 
order-policy is still required if you specify a value for dispatcher threads.

BR/

Alberto B.

De: Mario Kevo 
Enviado: miércoles, 15 de enero de 2020 18:22
Para: Alberto Bustamante Reyes ; 
dev@geode.apache.org 
Asunto: Odg: GW sender dispatcher threads & order policy

Hi Alberto,

This is already solved in Geode 1.12.0.

https://issues.apache.org/jira/browse/GEODE-7561

BR,
Mario
____
Šalje: Alberto Bustamante Reyes 
Poslano: 15. siječnja 2020. 18:14
Prima: dev@geode.apache.org 
Predmet: GW sender dispatcher threads & order policy

Hi,

I have seen that if I change the default number of dispatcher threads ( 5 ) 
when creating a gateway sender, I get an error saying I must specify an order 
policy:

"Must specify --order-policy when --dispatcher-threads is larger than 1."

I find this odd, taking into account that the default value is already larger 
than 1 and order-policy has a default value. Actually, the error is shown if 
you specify "--dispatcher-threads=5". I was going to create a ticket to report 
this but I have a question: what is the use case for having a sender with less 
than 1 dispatcher thread?

Thanks!

BR/

Alberto B.


GW sender dispatcher threads & order policy

2020-01-15 Thread Alberto Bustamante Reyes
Hi,

I have seen that if I change the default number of dispatcher threads ( 5 ) 
when creating a gateway sender, I get an error saying I must specify an order 
policy:

"Must specify --order-policy when --dispatcher-threads is larger than 1."

I find this odd, taking into account that the default value is already larger 
than 1 and order-policy has a default value. Actually, the error is shown if 
you specify "--dispatcher-threads=5". I was going to create a ticket to report 
this but I have a question: what is the use case for having a sender with less 
than 1 dispatcher thread?

Thanks!

BR/

Alberto B.


RE: Concourse tests logs

2020-01-09 Thread Alberto Bustamante Reyes
great, I found the logs, thanks!

De: Jens Deppe 
Enviado: jueves, 9 de enero de 2020 18:43
Para: dev@geode.apache.org 
Asunto: Re: Concourse tests logs

Hi Alberto,

You should be able to look at the 'archive_results' job and see links for
test artifacts at the bottom of the log. Something like this:


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.12.0-SNAPSHOT.0180/test-results/test/1578534134/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.12.0-SNAPSHOT.0180/test-artifacts/1578534134/windows-unittestfiles-OpenJDK8-1.12.0-SNAPSHOT.0180.tgz

--Jens

On Thu, Jan 9, 2020 at 9:32 AM Alberto Bustamante Reyes
 wrote:

> Hi,
>
> How could I check the tests reports from concourse executions? I see some
> tests failing on a PR but they work fine on my laptop, so I would like to
> have more info to fix them.
>
> Thanks!
>
> Alberto B.
>


Concourse tests logs

2020-01-09 Thread Alberto Bustamante Reyes
Hi,

How could I check the tests reports from concourse executions? I see some tests 
failing on a PR but they work fine on my laptop, so I would like to have more 
info to fix them.

Thanks!

Alberto B.


RE: WAN replication issue in cloud native environments

2019-12-05 Thread Alberto Bustamante Reyes
Thanks Charlie, but the issue is not about connectivity. Summarizing the issue, 
the problem is that if you have two or more gw receivers that are started with 
the same value of "hostname-for-senders", "start-port" and "end-port" (being 
"start-port" and "end-port" equal) parameters, if one gw receiver stops, the 
locator will publish to any remote locator that there are no receivers up.

And this use case is likely to happen on cloud-native environments, as 
described.

BR/

Alberto B.

De: Charlie Black 
Enviado: miércoles, 4 de diciembre de 2019 18:11
Para: dev@geode.apache.org 
Asunto: Re: WAN replication issue in cloud native environments

Alberto,

Something else to think about SNI based routing.   I believe Mario might be
working on adding SNI to Geode - he at least had a proposal that he
e-mailed out.

Basics are the destination host is in the SNI field and the proxy can
inspect and route the request to the right service instance. Plus we
have the option to not terminate the SSL at the proxy.

Full disclosure - I haven't tried out SNI based routing myself and it is
something that I thought could work as I was reading about it.   From the
whiteboard I have done I think this will do ingress and egress just fine.
Potentially easier then port mapping and `hostname for clients` playing
around.

Just something to think about.

Charlie


On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
 wrote:

> Hi Jacob,
>
> Yes,we are using LoadBalancer service type. But note the problem is not
> the transport layer but on Geode as GW senders are complaining
> “sender-2-parallel : Could not connect due to: There are no active
> servers.” when one of the servers in the receiving cluster is killed.
>
> So, there is still one server alive in the receiving cluster but GW sender
> does not know it and the locator is not able to inform about its existence.
> Looking at the code it seems internal data structures (maps) holding the
> profiles use object whose equality check relies only on hostname and port.
> This makes it impossible to differentiate servers when the same
> “hostname-for-senders” and port are used. When the killed server comes back
> up, the locator profiles are updated (internal map back to size()=1
> although 2+ servers are there) and GW senders happily reconnect.
>
> The solution with the Geode as-is would be to expose each GW receiver on a
> different port outside of k8s cluster, this includes creating N Kubernetes
> services for N GW receivers in addition to updating the service mesh
> configuration (if it is used, firewalls etc…). Declarative nature of
> kubernetes means we must know the ports in advance hence start-port and
> end-port when creating each GW receiver must be equal and we should have
> some well-known
> algorithm when creating GW receivers across servers. For example: server-0
> port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> receivers must be wired individually and we must turn off Geode’s random
> port allocation.
>
> But we are exploring the possibility for Geode to handle this cloud-native
> configuration a bit better. Locators should be capable of holding GW
> receiver information although they are hidden behind same hostname and port.
> This is a code change in Geode and we would like to have community opinion
> on it.
>
> Some obvious impacts with the legacy behavior would be when locator picks
> a server on behalf of the client (GW sender in this case) it does so based
>  on the server load. When sender connects and considering all servers are
> using same VIP:PORT it is load balancer that will decide where the
> connection will end up, but likely not on the one selected by locator. So
> here we ignore the locator instructions. Since GW senders normally do not
> create huge number of connections this probably shall not unbalance cluster
> too much. But this is an impact worth considering. Custom load metrics
> would also be ignored by GW senders. Opinions?
>
> Additional impact that comes to mind is GW sender load-balance command and
> how it’s execution would be affected.
>
> Thanks!
>
> Alberto B.
>
> 
> De: Jacob Barrett 
> Enviado: viernes, 29 de noviembre de 2019 13:06
> Para: dev@geode.apache.org 
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes
>  wrote:
> >
> > The reason for such a setup is deploying Geode cluster on a Kubernetes
> cluster where all GW receivers are reachable from the outside world on the
> same VIP and port.
>
> Are you using LoadBalancer Service type?
>
> > Other kinds of configuration (different hostname and/

RE: WAN replication issue in cloud native environments

2019-12-04 Thread Alberto Bustamante Reyes
Hi Jacob,

Yes,we are using LoadBalancer service type. But note the problem is not the 
transport layer but on Geode as GW senders are complaining “sender-2-parallel : 
Could not connect due to: There are no active servers.” when one of the servers 
in the receiving cluster is killed.

So, there is still one server alive in the receiving cluster but GW sender does 
not know it and the locator is not able to inform about its existence. Looking 
at the code it seems internal data structures (maps) holding the profiles use 
object whose equality check relies only on hostname and port. This makes it 
impossible to differentiate servers when the same “hostname-for-senders” and 
port are used. When the killed server comes back up, the locator profiles are 
updated (internal map back to size()=1 although 2+ servers are there) and GW 
senders happily reconnect.

The solution with the Geode as-is would be to expose each GW receiver on a 
different port outside of k8s cluster, this includes creating N Kubernetes 
services for N GW receivers in addition to updating the service mesh 
configuration (if it is used, firewalls etc…). Declarative nature of kubernetes 
means we must know the ports in advance hence start-port and end-port when 
creating each GW receiver must be equal and we should have some well-known
algorithm when creating GW receivers across servers. For example: server-0 port 
5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW receivers must be 
wired individually and we must turn off Geode’s random port allocation.

But we are exploring the possibility for Geode to handle this cloud-native 
configuration a bit better. Locators should be capable of holding GW receiver 
information although they are hidden behind same hostname and port.
This is a code change in Geode and we would like to have community opinion on 
it.

Some obvious impacts with the legacy behavior would be when locator picks a 
server on behalf of the client (GW sender in this case) it does so based
 on the server load. When sender connects and considering all servers are using 
same VIP:PORT it is load balancer that will decide where the connection will 
end up, but likely not on the one selected by locator. So here we ignore the 
locator instructions. Since GW senders normally do not create huge number of 
connections this probably shall not unbalance cluster too much. But this is an 
impact worth considering. Custom load metrics would also be ignored by GW 
senders. Opinions?

Additional impact that comes to mind is GW sender load-balance command and how 
it’s execution would be affected.

Thanks!

Alberto B.


De: Jacob Barrett 
Enviado: viernes, 29 de noviembre de 2019 13:06
Para: dev@geode.apache.org 
Asunto: Re: WAN replication issue in cloud native environments



> On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes 
>  wrote:
>
> The reason for such a setup is deploying Geode cluster on a Kubernetes 
> cluster where all GW receivers are reachable from the outside world on the 
> same VIP and port.

Are you using LoadBalancer Service type?

> Other kinds of configuration (different hostname and/or different port for 
> each GW receiver) are not cheap from OAM and resources perspective in cloud 
> native environments and also limit some important use-cases (like scaling).

If you could somehow configure host and port for sender (code modification 
required) would exposing each port through the LoadBalancer be too expensive 
too?

> The problem experienced is that shutting down one server is stopping 
> replication to this cluster until the server is up again. We suspect this is 
> because Geode incorrectly assumes there are no more alive servers when just 
> one of them is down (since they share hostname-for-senders and port).

Sees like at the worst case when it tries to reconnect the LB should give it a 
live server and it think the single server is back up.

-Jake



WAN replication issue in cloud native environments

2019-11-29 Thread Alberto Bustamante Reyes
Hi all,

We have a problem with Geode WAN replication when GW receivers are configured 
with the same hostname-for-senders and port on all servers.

The reason for such a setup is deploying Geode cluster on a Kubernetes cluster 
where all GW receivers are reachable from the outside world on the same VIP and 
port.

Other kinds of configuration (different hostname and/or different port for each 
GW receiver) are not cheap from OAM and resources perspective in cloud native 
environments and also limit some important use-cases (like scaling).

The problem experienced is that shutting down one server is stopping 
replication to this cluster until the server is up again. We suspect this is 
because Geode incorrectly assumes there are no more alive servers when just one 
of them is down (since they share hostname-for-senders and port).

Has anyone experienced a similar problem configuring Geode WAN replication in 
cloud native environments?

Thinking about possible solutions in Geode code, our proposal would be to 
expand internal data in locators with enough information to distinguish servers 
in the beforementioned use case. The same intervention is likely needed in the 
client pools and possibly elsewhere in the source code. Any comments about this 
proposal is welcome.

Thanks in advance!

Alberto B.


RE: Cache.close is not synchronous?

2019-11-26 Thread Alberto Bustamante Reyes
+1 for fixing it.

De: Anilkumar Gingade 
Enviado: martes, 26 de noviembre de 2019 0:24
Para: geode 
Asunto: Re: Cache.close is not synchronous?

Looking at the code, the cache.close() and InternalCacheBuilder.create()
are synchronized on "GemFireCacheImpl.class"'; it's the
internalCachebuilder create that seems to be using reference to the old
distributed-system.
The GemFireCacheImpl.getInstance() and getExisting() both perform
"isClosing" check and does early return. The InternalCacheBuilder is new;
not sure if its missing early checks.

-Anil.

On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson  wrote:

> +1 to fix.
>
> > On Nov 25, 2019, at 2:02 PM, John Blum  wrote:
> >
> > +1 ^ 64!
> >
> > I found this out the hard way some time ago and is why STDG exists in the
> > first place (i.e. usability issues, particularly with testing).
> >
> > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund  wrote:
> >
> >> I found a test that closes the cache and then recreates the cache
> multiple
> >> times with 2 second sleep between each. I tried to remove the
> Thread.sleep
> >> and found that recreating the cache
> >> throws DistributedSystemDisconnectedException (see below).
> >>
> >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> this
> >> way?
> >>
> >> Personally, I want Cache.close() to block until both Cache and
> >> DistributedSystem are closed and the API is ready to create a new Cache.
> >>
> >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> This
> >> connection to a distributed system has been disconnected.
> >>at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> >>at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.GemFireCacheImpl.(GemFireCacheImpl.java:791)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> >>at
> >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> >>
> >
> >
> > --
> > -John
> > john.blum10101 (skype)
>
>


RE: Help to reproduce GEODE-7060

2019-11-25 Thread Alberto Bustamante Reyes
Ok, then Im going to close the ticket. Thanks for the info Blake!

De: Blake Bender 
Enviado: lunes, 25 de noviembre de 2019 15:11
Para: dev@geode.apache.org 
Asunto: Re: Help to reproduce GEODE-7060

Hi Alberto,

I apologize for the mention of gemfire-node-client in OSS Geode JIRA, this
is a proprietary product that you don't have access to.  For the time
being, I believe, we have fixed all of the ACE leaks we've found, so
GEODE-7060 can be safely closed.

Just for context, we found that, on MacOS, if nativeclient items were
leaking at app shutdown and any of them contained an ACE object, certain of
these would leak an OS resource that couldn't be recovered.  Eventually
(typically after a number of test runs) this resource would run out, and
the only way to recover was to restart the OS.  We've since fixed a number
of "resource leak at shutdown" bugs in the native client, and haven't seen
GEODE-7060 in a couple of months or more.

Thanks,

Blake


On Mon, Nov 25, 2019 at 5:49 AM Alberto Bustamante Reyes
 wrote:

> Hi,
>
> I need some info to reproduce the issue described in GEODE-7060, about an
> ACE resource leak.
> The ticket says the problem was seen using a test called
> "gemfire-node-client putall.js", but I suppose that test is not part of
> Geode.
>
> Could someone with access to that test describe what it is doing? Any idea
> to reproduce the issue is welcome.
>
> Thanks in advance.
>
>
> Alberto B.
>


Help to reproduce GEODE-7060

2019-11-25 Thread Alberto Bustamante Reyes
Hi,

I need some info to reproduce the issue described in GEODE-7060, about an ACE 
resource leak.
The ticket says the problem was seen using a test called "gemfire-node-client 
putall.js", but I suppose that test is not part of Geode.

Could someone with access to that test describe what it is doing? Any idea to 
reproduce the issue is welcome.

Thanks in advance.


Alberto B.


Re: Website Banner Graphic

2019-11-14 Thread Alberto Bustamante Reyes
+1


De: Helena Bales 
Enviado: jueves, noviembre 14, 2019 7:52 p. m.
Para: dev@geode.apache.org
Asunto: Re: Website Banner Graphic

+1

On Thu, Nov 14, 2019 at 10:47 AM Jacob Barrett  wrote:

> Does anyone know who is responsible for our website’s graphics. The
> recently added “TM” to the Geode logo needs to be redone. My design brain
> can’t handle that the text is in the wrong font, the wrong color and isn’t
> anti-aliased like the rest of the text and graphic components.
>
> https://geode.apache.org 
> https://geode.apache.org/img/apache_geode_logo.png <
> https://geode.apache.org/img/apache_geode_logo.png>
>
> The “TM" should be added to the original vector graphic and a new raster
> should be generated from that.
>
> -Jake
>
>


RE: Update geode-native-build image as part of release process

2019-11-14 Thread Alberto Bustamante Reyes
Owen created a PR to automatically update the geode-native-build image ( 
https://github.com/apache/geode/pull/4315 ). I think it could be merged in 1.11 
release branch too, so the script is used in the next release.
And then I would remove the manual steps from the wiki.



De: Alberto Bustamante Reyes 
Enviado: martes, 12 de noviembre de 2019 13:35
Para: Anthony Baker ; dev@geode.apache.org 

Cc: Dick Cavender 
Asunto: RE: Update geode-native-build image as part of release process

update: It is not possible to create the geode-native-build image based on the 
geode image. Geode image is based on alpine, and geode-native-build needs Clang 
6, which is not available yet for alpine. We will have to wait until it is 
released.

De: Alberto Bustamante Reyes 
Enviado: martes, 12 de noviembre de 2019 10:30
Para: Anthony Baker ; dev@geode.apache.org 

Cc: Dick Cavender 
Asunto: RE: Update geode-native-build image as part of release process

Good idea Anthony, I will investigate that.

BTW, I have observed that geode/docker/Dockerfile is outdated. It is using "ENV 
GEODE_VERSION 1.9.0". I suspect that after the release of 1.10 the changes done 
in the Dockerfile were not merged back to develop, maybe the release branch was 
not merged back to develop. Anyway, the image tagged as 1.10.0 is using that 
version, so that means the promote_rc script works fine.

De: Anthony Baker 
Enviado: lunes, 11 de noviembre de 2019 19:07
Para: dev@geode.apache.org 
Cc: Dick Cavender 
Asunto: Re: Update geode-native-build image as part of release process

Thanks Alberto!  Maybe as a future enhancement we should consider extending the 
geode-native-build image from the geode image to make this simpler.

Anthony


> On Nov 11, 2019, at 2:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Done! I have updated the wiki adding the steps to update the Docker image.
>
> https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode#ReleasingApacheGeode-Updategeode-native-buildDockerimage
>
>
> 
> De: Dick Cavender 
> Enviado: jueves, 7 de noviembre de 2019 17:51
> Para: dev@geode.apache.org 
> Asunto: Re: Update geode-native-build image as part of release process
>
> +1
>
> On Thu, Nov 7, 2019 at 8:38 AM Owen Nichols  wrote:
>
>> +1
>>
>> On Thu, Nov 7, 2019 at 6:46 AM Alberto Bustamante Reyes
>>  wrote:
>>
>>> Hi all,
>>>
>>> Some time ago I opened GEODE-7056 to update the Dockerfile of the
>>> "geode-native-build", because I saw that it was using 1.6.0 although
>> 1.9.0
>>> was available. After this change, other ticket was needed to build and
>>> update the image in Dockerhub.
>>>
>>> I think this task (update file, build image and update image) should be
>>> part of the release process. Otherwise, the image will be outdated once
>>> there is a new release. And now that 1.11.0 is closer, I think its a good
>>> moment to do it.
>>>
>>> What do you think?
>>>
>>> BR/
>>>
>>> Alberto B.
>>>
>>



RE: Update geode-native-build image as part of release process

2019-11-12 Thread Alberto Bustamante Reyes
update: It is not possible to create the geode-native-build image based on the 
geode image. Geode image is based on alpine, and geode-native-build needs Clang 
6, which is not available yet for alpine. We will have to wait until it is 
released.

De: Alberto Bustamante Reyes 
Enviado: martes, 12 de noviembre de 2019 10:30
Para: Anthony Baker ; dev@geode.apache.org 

Cc: Dick Cavender 
Asunto: RE: Update geode-native-build image as part of release process

Good idea Anthony, I will investigate that.

BTW, I have observed that geode/docker/Dockerfile is outdated. It is using "ENV 
GEODE_VERSION 1.9.0". I suspect that after the release of 1.10 the changes done 
in the Dockerfile were not merged back to develop, maybe the release branch was 
not merged back to develop. Anyway, the image tagged as 1.10.0 is using that 
version, so that means the promote_rc script works fine.

De: Anthony Baker 
Enviado: lunes, 11 de noviembre de 2019 19:07
Para: dev@geode.apache.org 
Cc: Dick Cavender 
Asunto: Re: Update geode-native-build image as part of release process

Thanks Alberto!  Maybe as a future enhancement we should consider extending the 
geode-native-build image from the geode image to make this simpler.

Anthony


> On Nov 11, 2019, at 2:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Done! I have updated the wiki adding the steps to update the Docker image.
>
> https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode#ReleasingApacheGeode-Updategeode-native-buildDockerimage
>
>
> 
> De: Dick Cavender 
> Enviado: jueves, 7 de noviembre de 2019 17:51
> Para: dev@geode.apache.org 
> Asunto: Re: Update geode-native-build image as part of release process
>
> +1
>
> On Thu, Nov 7, 2019 at 8:38 AM Owen Nichols  wrote:
>
>> +1
>>
>> On Thu, Nov 7, 2019 at 6:46 AM Alberto Bustamante Reyes
>>  wrote:
>>
>>> Hi all,
>>>
>>> Some time ago I opened GEODE-7056 to update the Dockerfile of the
>>> "geode-native-build", because I saw that it was using 1.6.0 although
>> 1.9.0
>>> was available. After this change, other ticket was needed to build and
>>> update the image in Dockerhub.
>>>
>>> I think this task (update file, build image and update image) should be
>>> part of the release process. Otherwise, the image will be outdated once
>>> there is a new release. And now that 1.11.0 is closer, I think its a good
>>> moment to do it.
>>>
>>> What do you think?
>>>
>>> BR/
>>>
>>> Alberto B.
>>>
>>



RE: Update geode-native-build image as part of release process

2019-11-12 Thread Alberto Bustamante Reyes
Good idea Anthony, I will investigate that.

BTW, I have observed that geode/docker/Dockerfile is outdated. It is using "ENV 
GEODE_VERSION 1.9.0". I suspect that after the release of 1.10 the changes done 
in the Dockerfile were not merged back to develop, maybe the release branch was 
not merged back to develop. Anyway, the image tagged as 1.10.0 is using that 
version, so that means the promote_rc script works fine.

De: Anthony Baker 
Enviado: lunes, 11 de noviembre de 2019 19:07
Para: dev@geode.apache.org 
Cc: Dick Cavender 
Asunto: Re: Update geode-native-build image as part of release process

Thanks Alberto!  Maybe as a future enhancement we should consider extending the 
geode-native-build image from the geode image to make this simpler.

Anthony


> On Nov 11, 2019, at 2:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Done! I have updated the wiki adding the steps to update the Docker image.
>
> https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode#ReleasingApacheGeode-Updategeode-native-buildDockerimage
>
>
> 
> De: Dick Cavender 
> Enviado: jueves, 7 de noviembre de 2019 17:51
> Para: dev@geode.apache.org 
> Asunto: Re: Update geode-native-build image as part of release process
>
> +1
>
> On Thu, Nov 7, 2019 at 8:38 AM Owen Nichols  wrote:
>
>> +1
>>
>> On Thu, Nov 7, 2019 at 6:46 AM Alberto Bustamante Reyes
>>  wrote:
>>
>>> Hi all,
>>>
>>> Some time ago I opened GEODE-7056 to update the Dockerfile of the
>>> "geode-native-build", because I saw that it was using 1.6.0 although
>> 1.9.0
>>> was available. After this change, other ticket was needed to build and
>>> update the image in Dockerhub.
>>>
>>> I think this task (update file, build image and update image) should be
>>> part of the release process. Otherwise, the image will be outdated once
>>> there is a new release. And now that 1.11.0 is closer, I think its a good
>>> moment to do it.
>>>
>>> What do you think?
>>>
>>> BR/
>>>
>>> Alberto B.
>>>
>>



RE: Update geode-native-build image as part of release process

2019-11-11 Thread Alberto Bustamante Reyes
Done! I have updated the wiki adding the steps to update the Docker image.

https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode#ReleasingApacheGeode-Updategeode-native-buildDockerimage



De: Dick Cavender 
Enviado: jueves, 7 de noviembre de 2019 17:51
Para: dev@geode.apache.org 
Asunto: Re: Update geode-native-build image as part of release process

+1

On Thu, Nov 7, 2019 at 8:38 AM Owen Nichols  wrote:

> +1
>
> On Thu, Nov 7, 2019 at 6:46 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi all,
> >
> > Some time ago I opened GEODE-7056 to update the Dockerfile of the
> > "geode-native-build", because I saw that it was using 1.6.0 although
> 1.9.0
> > was available. After this change, other ticket was needed to build and
> > update the image in Dockerhub.
> >
> > I think this task (update file, build image and update image) should be
> > part of the release process. Otherwise, the image will be outdated once
> > there is a new release. And now that 1.11.0 is closer, I think its a good
> > moment to do it.
> >
> > What do you think?
> >
> > BR/
> >
> > Alberto B.
> >
>


RE: geode-native integration tests

2019-11-07 Thread Alberto Bustamante Reyes
Thanks for the answer Jake, I have sent a PR with the changes. I already know 
the plans to use concourse instead of travis in the Geode client, I have it in 
mind, its a good opportunity to learn about concourse.

De: Jacob Barrett 
Enviado: jueves, 7 de noviembre de 2019 17:50
Para: dev@geode.apache.org 
Asunto: Re: geode-native integration tests



> On Nov 7, 2019, at 7:41 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Im curious about the geode-native integration tests. I have seen there are 
> two kind of tests, depending on the framework they are based on. But no one 
> of them are included in the CI executed by travis. Are they executed only as 
> part of the Geode release process?

The long term goal is to have geode-native built continuously with along with 
main java bits. If you are interested in helping with this effort that might 
speed up the process.

> What is the way of working regarding new tests? I assume that new tests 
> should be written using the new framework. But what about the old ones? Is 
> there any plan to port these test to the new framework? (I suppose that PRs 
> for that are welcome)

Yes all new tests should be written in the new framework. The new framework is 
a work in progress itself. It it is missing something, add it. If it needs 
refactoring, refactor it. There is no current plan to bulk convert the old 
tests, see next response.

> What if a PR requires the modification of an old integration test? I suppose 
> that as a general rule, instead of modifying that test case, a new test case 
> in the new framework should be written and then the old one should be removed.

The old framework should be considered deprecated and obsolete. Make minimal 
changes as necessary to it. If you make a PR that touches several tests it is 
reasonable to just update the old integration tests. If your PR would have you 
changing one or a few of the old tests then please rewrite those tests in the 
new framework and delete the old. Hopefully over time we won’t have any old 
tests left.

> Im missing all this info about the way of working in the CONTRIBUTING.md 
> file, I can include it once this is cleared up.

Yes please contribute to the CONTRIBUTING.md!

Thanks,
Jake



geode-native integration tests

2019-11-07 Thread Alberto Bustamante Reyes
Hi Geode community,

Im curious about the geode-native integration tests. I have seen there are two 
kind of tests, depending on the framework they are based on. But no one of them 
are included in the CI executed by travis. Are they executed only as part of 
the Geode release process?

What is the way of working regarding new tests? I assume that new tests should 
be written using the new framework. But what about the old ones? Is there any 
plan to port these test to the new framework? (I suppose that PRs for that are 
welcome)

What if a PR requires the modification of an old integration test? I suppose 
that as a general rule, instead of modifying that test case, a new test case in 
the new framework should be written and then the old one should be removed.

Im missing all this info about the way of working in the CONTRIBUTING.md file, 
I can include it once this is cleared up.

Thanks!


Alberto B.


Update geode-native-build image as part of release process

2019-11-07 Thread Alberto Bustamante Reyes
Hi all,

Some time ago I opened GEODE-7056 to update the Dockerfile of the 
"geode-native-build", because I saw that it was using 1.6.0 although 1.9.0 was 
available. After this change, other ticket was needed to build and update the 
image in Dockerhub.

I think this task (update file, build image and update image) should be part of 
the release process. Otherwise, the image will be outdated once there is a new 
release. And now that 1.11.0 is closer, I think its a good moment to do it.

What do you think?

BR/

Alberto B.


RE: [DISCUSS]: Commit Message Format too Short?

2019-10-08 Thread Alberto Bustamante Reyes
I think its a good idea to have an automatic mechanism to reject commits that 
exceed a given limit.
In the previous project I was assigned we used Gerrit instead of Github, and we 
had an automatic check to vote -1 if your commit message exceeded the limit.

Anyway, while this is decided, a quick action could be to add a new line to the 
PR template, at least to remember it:

- [ ] Is your commit message length below the limit of 50 characters?






De: Juan José Ramos 
Enviado: martes, 8 de octubre de 2019 11:32
Para: dev@geode.apache.org 
Asunto: Re: [DISCUSS]: Commit Message Format too Short?

Hello Owen,

Yes, I fully agree with you. And just to be clear, I wasn't trying to
discourage descriptive commit messages, on the contrary, we certainly must
encourage them at all cost!!. It was decided that we should, however, try
to keep consistency across all commits and make the subject brief, adding
the full details within the body of the text; as described in *How to write
a Git commit message [1], *referenced in our *Commit Message Format
[2] *article.
Right now we're not enforcing this rule, there are even some commits
without the ticket number at the beginning of the commit subject :-/.
I guess the goal of this thread is to gather some feedback and opinions
from the community to better decide how to proceed: remove the rule,
increase the maximum amount of characters from 50 to something else in the
commit message subject, automatically enforce the rule altogether and
prevent commits that don't follow it, etc.
Best regards.

[1]: https://chris.beams.io/posts/git-commit/
[2]: https://cwiki.apache.org/confluence/display/GEODE/Commit+Message+Format

On Tue, Oct 8, 2019 at 10:07 AM Owen Nichols  wrote:

> I don’t care how long it is, but knowing that many tools show only the
> first bit, it’s helpful if the message is phrased with the most important
> words near the beginning.
>
> I’d much prefer to encourage rather than discourage descriptive commit
> messages. Even better if all commit messages mentioned more about _why_ the
> change is being made, not just describe the diff.
>
> But most important of all, NEVER forget the colon between the ticket number
> and the rest.  I learned that the hard way :(
>
> -Owen
>
> On Tue, Oct 8, 2019 at 1:52 AM Ju@N  wrote:
>
> > Hello devs,
> >
> > I've notice that, lately, not everybody is following the guidelines we
> have
> > highlighted in our Wiki under *Commit Message Format [1]*, specially the
> > first requirement: *GEODE-nn: Capitalized, 50 chars or less summary. *As
> an
> > example, out of the last 33 commits in develop, only 11 follow the 50
> chars
> > max rule.
> > Even though I've always followed this "rule", I often find it hard to
> > provide a summary of the commit in less than 50 chars, that's probably
> the
> > reason why other people are just ignoring this part of the guidelines?.
> > Should we increase the maximum amount of characters from 50 to something
> > else?, should we add a hard check in order to automatically enforce the
> > rule?, should we delete the rule altogether?, thoughts?.
> > Best regards.
> >
> > [1]:
> > https://cwiki.apache.org/confluence/display/GEODE/Commit+Message+Format
> >
> > --
> > Ju@N
> >
>


--
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io


RE: Off-heap support deactivation

2019-10-02 Thread Alberto Bustamante Reyes
thanks for the detailed explanation Darrel.

There is something I did not get: "alter disk-store" is used to align the 
region info the disk store has when a region attribute is changed. But off-heap 
support is an attribute that cannot be changed on a region after it is created. 
So then, what is the utility of being able to run "alter disk-store" with 
"--off-heap" parameter if that is something that will not change in the region?


De: Darrel Schneider 
Enviado: lunes, 30 de septiembre de 2019 17:51
Para: dev@geode.apache.org 
Asunto: Re: Off-heap support deactivation

You can specify this setting at the time you create the region. Geode does
not have support for changing it on a region that already exists. Only a
few region attributes can be changed on a region that currently exists (see
the AttributesMutator API). So how is your region getting created? I think
it is probably from cluster configuration. So what you would need to do is
get the definition stored in cluster configuration. I don't think the gfsh
alter region command will let you change this attribute (alter region uses
AttributesMutator). So you either need to delete the current definition and
then create it again or you need to edit the current definition manually.
Using gfsh to destroy and create is the cleanest solution, but that will
also blow away the data you currently have persisted.
To change it manually you can use gfsh export to get your cluster config as
xml, edit the xml to change the offheap boolean property on the region, and
then use gfsh import to load the xml you edited. This requires that the
server are restarted.
If you are not using cluster config (I think you should be) then this is
actually easier. You either just edit your cache.xml file and restart the
server that is using it or you just change your code's use of RegionFactory
to create the region differently.

The whole alter disk-store thing is just an optimization. The region
attributes stored in the disk-store for a persistent region do not
determine how the region is configured. The cluster-config/xml/apis that
create the region do that. When a disk-store is initially loaded it does
not yet know how the regions are configured. But it creates some temporary
maps that are used later once the region is created. If the attributes
stored in the disk-store match those on the region configuration then the
region initialization will be faster and use less memory. So basically if
you do have a persistent region and then change how it is configured, if
you also then alter how it is configured on the disk-store you next restart
will recover faster. If you don't do the alter disk-store the first
recovery will be slower but the actual region config will be stored again
in the disk-store and subsequent recoveries will be fast.

On Mon, Sep 30, 2019 at 8:28 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> Is it possible to change the off-heap support of a region once it is
> created? The idea I got from documentation is that it is possible to do it
> if the region is persistent, as the off-heap flag of the region can be
> changed using "alter disk-store".
>
> I have run the following example to check it: with two servers, I created
> a partition persistent region, with off-heap set to true. Then I
> deactivated the off-heap support by using alter disk-store, as described in
> documentation. But I have observed that if I run "describe region", the
> off-heap flag is still set to true. And if I populate entries, the values
> are stored in the off-heap memory.
>
> Did I misunderstood the documentation or I did something wrong?
>
> Thanks in advance,
>
> Alberto B.
>
>
> PD: I wrote down the steps I followed in the following gist:
> https://gist.github.com/alb3rtobr/e1fcf4148fe46f2e7b9e02a2e458624c
>


Off-heap support deactivation

2019-09-30 Thread Alberto Bustamante Reyes
Hi all,

Is it possible to change the off-heap support of a region once it is created? 
The idea I got from documentation is that it is possible to do it if the region 
is persistent, as the off-heap flag of the region can be changed using "alter 
disk-store".

I have run the following example to check it: with two servers, I created a 
partition persistent region, with off-heap set to true. Then I deactivated the 
off-heap support by using alter disk-store, as described in documentation. But 
I have observed that if I run "describe region", the off-heap flag is still set 
to true. And if I populate entries, the values are stored in the off-heap 
memory.

Did I misunderstood the documentation or I did something wrong?

Thanks in advance,

Alberto B.


PD: I wrote down the steps I followed in the following gist: 
https://gist.github.com/alb3rtobr/e1fcf4148fe46f2e7b9e02a2e458624c


Backward compatibility issue in 1.10

2019-09-19 Thread Alberto Bustamante Reyes
Hi,

During PR review of GEODE-6871 it was found that GEODE-5222 introduced a 
backward compatibility issue by adding a new method to a public interface 
without providing a default implementation.
According to comments in the PR, although the impacted interface 
(DiskStoreMXBean) is public, it should not be implemented by applications, so 
the risk of breaking backward compatibility is low, but it exists.

Do you think this issue should be fixed in 1.10?

BR/

Alberto


RE: resource manager requirements & recommendations

2019-09-16 Thread Alberto Bustamante Reyes
Thanks for the answer and the links Anthony, the discussion is interesting.

I think the differences between using CMS and G1 should be documented, I will 
to contribute in this topic. For example, we have found these comments in a 
GemFire support ticket 
(https://community.pivotal.io/s/question/0D50e5q9JT0CAM/please-refer-to-the-pivotal-ticket-210727):

"First, we are not completely compatible with G1GC yet in GemFire, meaning that 
some features, percentages, etc., in GemFire need to be rethought out if 
changing from CMS to G1. For example, if using eviction or critical thresholds, 
with CMS, these percentages would be a % of "Tenured" heap size. For G1GC, they 
would be a % of "Total" heap size, because as you may realize, G1GC doesn't 
have a max Eden space or max Tenured space."



De: Anthony Baker 
Enviado: miércoles, 11 de septiembre de 2019 18:58
Para: dev@geode.apache.org 
Asunto: Re: resource manager requirements & recommendations

The challenge with designing a good approach for managing heap use in Java is 
that we *can’t* know how much of the current heap use is really garbage.  That 
means that it can be really easy to evict too much or too little data.

With the CMS engine there are tuning parameters like occupancy fraction that 
you can set to match the eviction threshold.  This leads to a fairly 
predictable approach to managing heap memory.  With G!GC, the challenge is 
harder since the entire heap might fill up with garbage before any collections 
occur.

Despite CMS being deprecated, I think it’s currently the best choice to control 
heap use in Geode.  As noted in JEP 291 [1] and subsequent discussion [2]:  
"For some applications CMS is a very good fit and might always outperform G1”.  
I also think we need to do more work in this area to make G1 perform as well as 
CMS.

Anthony

[1] http://openjdk.java.net/jeps/291
[2] http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-April/thread.html#start

> On Sep 11, 2019, at 9:14 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi all,
>
> Im interested on using the resource manager with G1 garbage collector. To 
> check if it is possible, I have been reading documentation about heap memory 
> management and I came up with some questions because there are some points in 
> the documentation where it is not clear for me if they are describing 
> requirements or recommendations.
>
> As far as I understood, the requirements for using the Resource Manager are 
> only two:
>
>  *   set the critical heap percentage
>  *   configure your GC properly in order to work before the eviction 
> procedure starts.
>
> Am I right? There are three points in the documentation that makes me 
> question if I'm correct:
>
>
>  1.  The first chapter in 
> https://geode.apache.org/docs/guide/19/managing/heap_use/heap_management.html 
> states how to configure your GC for improving performance, but it only talks 
> about CMS, there is no info about other GCs.
>  2.  In the steps of how to configure ResourceManager, when talking about 
> tuning GC parameters, it talks again only about CMS.
>  3.  In the documentation of ResourceManager class, setCriticalHeapPercentage 
> method, it is stated the following:
>
> Many virtual machine implementations have additional VM switches to control 
> the behavior of the garbage collector. We suggest that you investigate tuning 
> the garbage collector when using this type of eviction controller. A 
> collector that frequently collects is needed to keep our heap usage up to 
> date. In particular, on the Sun HotSpot VM, the -XX:+UseConcMarkSweepGC flag 
> needs to be set, [...]
>
> So it seems that CMS is a requirement, but I have not found in the code any 
> limitation about using only CMS.
>
> If my previous statement about the requirements is fine, then I suppose the 
> documentation needs a review to distinguish between generic requirements and 
> the CMS specific use case.
>
> Other question that come to my mind is about the lack of info about G1. As 
> CMS is deprecated since Java 9, are there any plans to test and document G1 
> configuration?
>
> Thanks in advance for your comments!
>
> Alberto B.
>
>
>
>
>
>



resource manager requirements & recommendations

2019-09-11 Thread Alberto Bustamante Reyes
Hi all,

Im interested on using the resource manager with G1 garbage collector. To check 
if it is possible, I have been reading documentation about heap memory 
management and I came up with some questions because there are some points in 
the documentation where it is not clear for me if they are describing 
requirements or recommendations.

As far as I understood, the requirements for using the Resource Manager are 
only two:

  *   set the critical heap percentage
  *   configure your GC properly in order to work before the eviction procedure 
starts.

Am I right? There are three points in the documentation that makes me question 
if I'm correct:


  1.  The first chapter in 
https://geode.apache.org/docs/guide/19/managing/heap_use/heap_management.html 
states how to configure your GC for improving performance, but it only talks 
about CMS, there is no info about other GCs.
  2.  In the steps of how to configure ResourceManager, when talking about 
tuning GC parameters, it talks again only about CMS.
  3.  In the documentation of ResourceManager class, setCriticalHeapPercentage 
method, it is stated the following:

Many virtual machine implementations have additional VM switches to control the 
behavior of the garbage collector. We suggest that you investigate tuning the 
garbage collector when using this type of eviction controller. A collector that 
frequently collects is needed to keep our heap usage up to date. In particular, 
on the Sun HotSpot VM, the -XX:+UseConcMarkSweepGC flag needs to be set, [...]

So it seems that CMS is a requirement, but I have not found in the code any 
limitation about using only CMS.

If my previous statement about the requirements is fine, then I suppose the 
documentation needs a review to distinguish between generic requirements and 
the CMS specific use case.

Other question that come to my mind is about the lack of info about G1. As CMS 
is deprecated since Java 9, are there any plans to test and document G1 
configuration?

Thanks in advance for your comments!

Alberto B.








RE: Passed: Nordix/geode-native#15 (test-ci - e40e206)

2019-08-27 Thread Alberto Bustamante Reyes
Sorry for this message, we did not change the CI notification email in our 
geode-native fork so it was sent to the list.
Im going to modify it.

De: Travis CI 
Enviado: martes, 27 de agosto de 2019 12:11
Para: dev@geode.apache.org 
Asunto: Passed: Nordix/geode-native#15 (test-ci - e40e206)

Build Update for Nordix/geode-native
-

Build: #15
Status: Passed

Duration: 1 hr, 36 mins, and 36 secs
Commit: e40e206 (test-ci)
Author: Alberto Bustamante Reyes
Message: Test image

View the changeset: 
https://github.com/Nordix/geode-native/compare/520c4877d5a3...e40e20602546

View the full build log and details: 
https://travis-ci.com/Nordix/geode-native/builds/124703330?utm_medium=notification_source=email

--

You can unsubscribe from build emails from the Nordix/geode-native repository 
going to 
https://travis-ci.com/account/preferences/unsubscribe?repository=10207138_medium=notification_source=email.
Or unsubscribe from *all* email updating your settings at 
https://travis-ci.com/account/preferences/unsubscribe?utm_medium=notification_source=email.
Or configure specific recipients for build notifications in your .travis.yml 
file. See https://docs.travis-ci.com/user/notifications.



RE: Updating geode-native-build docker image

2019-08-27 Thread Alberto Bustamante Reyes
The geode-native-build image has not been updated yet. Could someone build and 
push it? I could do it, but Im not a committer. If that is not a problem, my 
dockerhub user is alb3rtobr.

De: Anthony Baker 
Enviado: jueves, 8 de agosto de 2019 0:54
Para: dev@geode.apache.org 
Asunto: Re: Updating geode-native-build docker image

Committers can request access to the geode docker account to push new images.  
Note that any geode source or binaries in these images should *only* include 
releases that have been voted on and approved by the PMC (e.g. v1.9.0, v1.8.0, 
…).

Can you send me your docker username?

Anthony


> On Aug 7, 2019, at 3:47 PM, Michael Oleske  wrote:
>
> Hi Geode Devs!
>
> Geode Native merged https://github.com/apache/geode-native/pull/509 this
> morning since our docker image was using an old Geode version.  What is the
> proper way to update docker hub (
> https://hub.docker.com/r/apachegeode/geode-native-build) with the new
> image?  Is that something committers should be able to do?  Or is there an
> automated build that updates docker hub?
>
> Thanks!
> -michael



RE: Travis-ci & geode-native repo

2019-08-26 Thread Alberto Bustamante Reyes
that was fast: I contacted Travis support and they increased the timeout for us 


De: Alberto Bustamante Reyes 
Enviado: lunes, 26 de agosto de 2019 13:03
Para: dev@geode.apache.org 
Asunto: RE: Travis-ci & geode-native repo

Thanks for the answer Jake (sorry for the late answer, I was on holidays).

Do you know if Travis is offering that support from free?

The task of moving from Travis to Concourse sounds interesting, is there an 
existing ticket about it?

De: Jacob Barrett 
Enviado: miércoles, 7 de agosto de 2019 17:11
Para: dev@geode.apache.org 
Asunto: Re: Travis-ci & geode-native repo

We worked with Travis support to increase our timeout on the backend. As you 
can see from the Travis report it takes a long time to build. It doesn’t run 
any of the integration tests either. We use it as a litmus test on PRs mostly.

There is a future goal to role the build into the same pipeline as the java 
sources and do binary releases. If you would like to pick up that task it would 
allow you to run a full CI of your own too.

-jake



> On Aug 7, 2019, at 7:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi,
>
> I have a question about the CI of the Geode C++ client. I would like to set 
> up Travis on a fork of the geode-native repo. I thought that the only 
> requirements to do so was to grant permissions to the repo, but our travis 
> tasks are failing due to the execution timeout is 50 minutes.
>
> I dont see in the travis.yaml any configuration about timeouts and I cannot 
> find any option to change that in Travis page. Could it be that geode-native 
> repo is not using the free version of Travis so it has a longer timeout? I 
> have seen that tasks takes aprox 1 hour and a half.
>
> Thanks!
>
> Alberto B.


RE: Travis-ci & geode-native repo

2019-08-26 Thread Alberto Bustamante Reyes
Thanks for the answer Jake (sorry for the late answer, I was on holidays).

Do you know if Travis is offering that support from free?

The task of moving from Travis to Concourse sounds interesting, is there an 
existing ticket about it?

De: Jacob Barrett 
Enviado: miércoles, 7 de agosto de 2019 17:11
Para: dev@geode.apache.org 
Asunto: Re: Travis-ci & geode-native repo

We worked with Travis support to increase our timeout on the backend. As you 
can see from the Travis report it takes a long time to build. It doesn’t run 
any of the integration tests either. We use it as a litmus test on PRs mostly.

There is a future goal to role the build into the same pipeline as the java 
sources and do binary releases. If you would like to pick up that task it would 
allow you to run a full CI of your own too.

-jake



> On Aug 7, 2019, at 7:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi,
>
> I have a question about the CI of the Geode C++ client. I would like to set 
> up Travis on a fork of the geode-native repo. I thought that the only 
> requirements to do so was to grant permissions to the repo, but our travis 
> tasks are failing due to the execution timeout is 50 minutes.
>
> I dont see in the travis.yaml any configuration about timeouts and I cannot 
> find any option to change that in Travis page. Could it be that geode-native 
> repo is not using the free version of Travis so it has a longer timeout? I 
> have seen that tasks takes aprox 1 hour and a half.
>
> Thanks!
>
> Alberto B.


Travis-ci & geode-native repo

2019-08-07 Thread Alberto Bustamante Reyes
Hi,

I have a question about the CI of the Geode C++ client. I would like to set up 
Travis on a fork of the geode-native repo. I thought that the only requirements 
to do so was to grant permissions to the repo, but our travis tasks are failing 
due to the execution timeout is 50 minutes.

I dont see in the travis.yaml any configuration about timeouts and I cannot 
find any option to change that in Travis page. Could it be that geode-native 
repo is not using the free version of Travis so it has a longer timeout? I have 
seen that tasks takes aprox 1 hour and a half.

Thanks!

Alberto B.


RE: Requirements for running distributed tests

2019-07-18 Thread Alberto Bustamante Reyes
thanks for the info Jens, we have a better picture now 

Then, gradle is in charge of spin up the containers, isnt it? We see that the 
command used to execute the distributed tests is the following:

gradlew gradlewStrict &&   sed -e 's/JAVA_HOME/GRADLE_JVM/g' -i.bak 
gradlewStrict &&   GRADLE_JVM=/usr/lib/jvm/java-8-openjdk-amd64 ./gradlewStrict 
 -PcompileJVM=/usr/lib/jvm/java-8-openjdk-amd64 -PcompileJVMVer=8 
-PtestJVM=/usr/lib/jvm/java-11-openjdk-amd64 -PtestJVMVer=11 
-PparallelDunit -PdunitDockerUser=geode -PdunitDockerImage=$(docker images 
--format '{{.Repository}}:{{.Tag}}') -PdunitParallelForks=4 --parallel 
--console=plain --no-daemon -x javadoc -x spotlessCheck -x rat 
distributedTest

As we were no using any docker image, our tests were running locally, and this 
is why we were getting errors about "ports in use". Is that right? Because we 
did not get any error, so I suppose for gradle its ok not getting any docker 
image for running the tests.
Where is this container management in the code?

How could we limit the number of containers? With "-PdunitParallelForks" 
parameter? We dont have as much resources as the community to run our CI, so we 
have to adapt the execution.

Regarding the docker image,is it possible to have permissions to download the 
docker image from http://gcr.io/apachegeode-ci/apache-develop-test-container ?  
I suppose the dockerhub image is not being used, as it was updated two years 
ago and the dockerfile was modified three months ago.

thanks again!

De: Jens Deppe 
Enviado: miércoles, 17 de julio de 2019 17:41
Para: dev@geode.apache.org
Asunto: Re: Requirements for running distributed tests

Hi Alberto,

The images used to run tests on in CI are built here:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-images
(you
may see these referred to as 'heavy lifters'). The packer scripts for these
can be found under here:
https://github.com/apache/geode/tree/develop/ci/images

The build model is not pure Concourse as, for every test category (ex.
distributedTest) we programmatically launch one of these 'heavy-lifters',
copy the code over, and then run the tests.

As you've noted, distributed tests are run inside docker containers. This
gives isolation so that there are no port/filesystem/network conflicts when
running tests in parallel. The container used for distributed tests is
built from this Dockerfile:
https://github.com/apache/geode/tree/develop/ci/images/test-container.

Windows tests are not run in parallel as we didn't have success in getting
the parallel dockerization to work consistently; so we only have a subset
of tests which run in 'normal', serial mode.

Hope this helps.
--Jens

On Wed, Jul 17, 2019 at 3:47 AM Alberto Bustamante Reyes
 wrote:

> Hi Geode community,
>
> We are trying to set up a CI loop in our Geode fork. We have started with
> the tests that are run for every pull requests, but we are having problems
> to replicate what is done in the Apache Geode repository, so any help will
> be appreciated.
> We are not using concour, but we are trying to run the same commands that
> are run at the end.
>
> In the case of distributedTests, if we run them we have problems saying
> that there are ports in use, which are not present if we run the tests
> independently ( I mean running for example "geode-core:distributedTest",
> "geode-cq:distributedTest" instead of just "distributedTest"). So we think
> we are missing something regarding the configuration of the VM where the
> tests are executed. We have seen there is a custom image used in google
> cloud (
> --image-family="${IMAGE_FAMILY_PREFIX}${WINDOWS_PREFIX}geode-builder" ), is
> it documented somewhere which are the requirements or configuration of that
> image?
>
> We have seen in the CI configuration (
> https://github.com/apache/geode/blob/develop/ci/pipelines/shared/jinja.variables.yml)
> that the requirement for distributedTests are 96 cpus & 180GB RAM. We can
> use only 24 cpus and 128GB RAM, but we have seen the tests are executed
> using the "dunitParallelForks" parameter to control how many docker
> containers are run in parallel, so we suppose we should modify this
> parameter.
>
> Where can we check how are these containers created and controlled?
>
> Thanks in advance!
>
> Alberto B.
>
>
>


Requirements for running distributed tests

2019-07-17 Thread Alberto Bustamante Reyes
Hi Geode community,

We are trying to set up a CI loop in our Geode fork. We have started with the 
tests that are run for every pull requests, but we are having problems to 
replicate what is done in the Apache Geode repository, so any help will be 
appreciated.
We are not using concour, but we are trying to run the same commands that are 
run at the end.

In the case of distributedTests, if we run them we have problems saying that 
there are ports in use, which are not present if we run the tests independently 
( I mean running for example "geode-core:distributedTest", 
"geode-cq:distributedTest" instead of just "distributedTest"). So we think we 
are missing something regarding the configuration of the VM where the tests are 
executed. We have seen there is a custom image used in google cloud ( 
--image-family="${IMAGE_FAMILY_PREFIX}${WINDOWS_PREFIX}geode-builder" ), is it 
documented somewhere which are the requirements or configuration of that image?

We have seen in the CI configuration 
(https://github.com/apache/geode/blob/develop/ci/pipelines/shared/jinja.variables.yml)
 that the requirement for distributedTests are 96 cpus & 180GB RAM. We can use 
only 24 cpus and 128GB RAM, but we have seen the tests are executed using the 
"dunitParallelForks" parameter to control how many docker containers are run in 
parallel, so we suppose we should modify this parameter.

Where can we check how are these containers created and controlled?

Thanks in advance!

Alberto B.




Re: What triggers a maintenance release?

2019-07-10 Thread Alberto Bustamante Reyes
We are suffering the problem of the multiple class loaders described in 
Geode-6716, which was solved in Geode-6822 
(https://issues.apache.org/jira/browse/GEODE-6822).


De: Anthony Baker 
Enviado: miércoles, 10 de julio de 2019 17:06:59
Para: dev@geode.apache.org
Asunto: Re: What triggers a maintenance release?

Great question, Alberto.  In the past we’ve done patch releases (the 3rd digit 
in X.Y.Z) due to security issues but it hasn’t been a very common occurrence.

What issue are you running into?  Perhaps we can help with an alternative 
approach or workaround.  If you would like the project to do a patch release, I 
encourage you to make a case :-)


Anthony


> On Jul 10, 2019, at 2:02 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi all,
>
>
> I have not found information about this topic in the wiki, and taking into 
> account that I cannot find any "hotfix" branch in github, maybe this has not 
> happened yet: I would like to ask about the criteria to create a maintenance 
> release.
>
> I suppose that a high priority bug should exist, but how is that priority 
> evaluated?
>
>
> If Im interested on a specific issue solution which will be included in next 
> release, but I cannot wait so long, could I ask for an official maintenance 
> version? Or creating my own Geode version is the only alternative?
>
>
> Thanks in advance,
>
>
> Alberto B.
>
>
>
>



What triggers a maintenance release?

2019-07-10 Thread Alberto Bustamante Reyes
Hi all,


I have not found information about this topic in the wiki, and taking into 
account that I cannot find any "hotfix" branch in github, maybe this has not 
happened yet: I would like to ask about the criteria to create a maintenance 
release.

I suppose that a high priority bug should exist, but how is that priority 
evaluated?


If Im interested on a specific issue solution which will be included in next 
release, but I cannot wait so long, could I ask for an official maintenance 
version? Or creating my own Geode version is the only alternative?


Thanks in advance,


Alberto B.






Disk dir size units in javadoc comments

2019-06-18 Thread Alberto Bustamante Reyes
Hi all,


I have observed that in the javadoc comments of setDiskDirsAndSizes method in 
DiskStoreFactory class, it is stated that dir sizes are expected to be in 
megabytes:

  /**
   * Sets the directories to which this disk store's data is written and also 
set the sizes in
   * megabytes of each directory.
   *
   * @param diskDirs directories to put the oplog files.
   * @param diskDirSizes sizes of disk directories in megabytes
   * @return a reference to this
   *
   * @throws IllegalArgumentException if length of the size array does not 
match to the length of
   * the dir array
   */
  DiskStoreFactory setDiskDirsAndSizes(File[] diskDirs, int[] diskDirSizes);


But I was taking a look at DiskRegionJUnitTest, and I have seen that the values 
introduced to create the dirs in the tests, are treated as bytes. Check 
"testDiskFullExcep()", for example.

Also, the comment in the default disk dir size in the DiskStoreFactory class, 
talks also about megabytes:

  /**
   * The default disk directory size in megabytes.
   * 
   * Current value: 2,147,483,647 which is two petabytes.
   */
  int DEFAULT_DISK_DIR_SIZE = Integer.MAX_VALUE; // unlimited for bug 41863

I supose these comments about "megabytes" are wrong and they should be changed 
to "bytes". Could someone confirm that? In that case I can change it.

Thanks in advance!

Alberto B.




PR review request

2019-06-12 Thread Alberto Bustamante Reyes
Hi,


Two weeks ago I sent a PR to the geode-examples repo, could someone take a look?

https://github.com/apache/geode-examples/pull/77


Thanks!


Alberto B.


Re: Issue with full disk store directories

2019-06-07 Thread Alberto Bustamante Reyes
Hi,


An update about this issue. I think the problem is in the PersistentOplogSet 
class, in the following method:


/**
   * Returns the next available DirectoryHolder which has space. If no dir has 
space then it will
   * return one anyway if compaction is enabled.
   *
   * @param minAvailableSpace the minimum amount of space we need in this 
directory.
   */
  DirectoryHolder getNextDir(int minAvailableSpace, boolean checkForWarning)




In order to select a directory, this check is performed:


if (dirHolder.getAvailableSpace() >= minAvailableSpace)


But I think it should compare the available space with the size of the oplog 
files, as they are created with the maximun size.

After changing the check by:


if (dirHolder.getAvailableSpace() >= parent.getMaxOplogSizeInBytes())


then the full folder is skipped, and the next one is selected.



I'll try to write a test to ilustrate this.


BR/


Alberto B.



____
De: Alberto Bustamante Reyes 
Enviado: lunes, 6 de mayo de 2019 10:04:34
Para: dev@geode.apache.org
Asunto: RE: Issue with full disk store directories

Not sure if I understand your question: in the test I did, I used one disk 
store composed by three directories, each one with different size. These 
directories were in the same disk partition. The issue I saw is that when the 
log files are initialized, it is not checked if they fit in the directory, so 
if the maximum directory sized is reached, the server crashes.

De: Anthony Baker 
Enviado: lunes, 29 de abril de 2019 17:24
Para: dev@geode.apache.org
Asunto: Re: Issue with full disk store directories

Question:  are you using similarly sized disk partitioned for all your disk 
stores?

> On Apr 24, 2019, at 3:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi all,
>
> I reported an issue in Jira, related with full disk store directories: 
> https://issues.apache.org/jira/browse/GEODE-6652
> As I describe there, the issue is that when using a disk store with 
> directories of different sizes, when oplog files rotate, the available space 
> of the next disk store directory to be used seems not to be checked correctly.
>
> BR/
>
> Alberto
>
>



Re: [DISCUSS] require reviews before merging a PR

2019-06-05 Thread Alberto Bustamante Reyes
my two cents (although Im a newcomer here):


I agree with Kirk in the point that if a PR is sent by a committer, if after 
"grace period" (one/two weeks?) no one reviewed it, the author could merge it. 
But of course he/she is free to wait for a review.


I was thinking about PRs made by contributors, as it is my case. A possible 
approach could be that if the author is a contributor, then at least one review 
is needed. Once that review is done, the reviewer should be free to decide 
between merging the change or, if the PR is still under the "grace period", 
keep it open waiting for other reviews. If the review is done after the "grace 
period", then the reviewer should merge the change.

And if a contributor PR has at least a review and the "grace period" ends, the 
author is allowed to ask someone for merge it, as it already has the green 
light from a committer.


Finally, I also agree on the point of increasing test coverage and follow clean 
code principles.


De: Kirk Lund 
Enviado: miércoles, 5 de junio de 2019 1:49:06
Para: geode
Asunto: Re: [DISCUSS] require reviews before merging a PR

I'm -1 for requiring N reviews before merging a commit.

Overall, I support Lazy Consensus. If I post a PR that fixes the flakiness
in a test, the precheckin jobs prove it, and it sits there for 2 weeks
without reviews, then I favor merging it in at that point without any
reviews. I'm not going to chase people around or spam the dev list over and
over asking for reviews. Nothing in the Apache Way says you have to do
reviews before committing -- some projects prefer "commit then review"
instead of "review then commit". You can always look at the code someone
changed and you can always change it further or revert it.

I think if we don't trust our committers then we have a bigger systemic
problem that becoming more strict about PR reviews isn not going to fix.

Overall, I also favor pairing/mobbing over reviews. Without being there
during the work, a reviewer lacks the context to understand why it was done
the way it was done.

If we cannot establish or maintain trust in committers, then I think we
should remove committer status from everyone and start over as a project,
proposing and accepting one committer at a time.

Instead of constraints on reviews, I would prefer to establish new criteria
for coding such as:
1) all classes touched in a PR must have a unit test created if none exists
2) all code touched in a PR must have unit test coverage (and possibly
integration test coverage) specific to the changes
3) all new classes must have full unit test coverage
4) all code touched in a PR must follow clean code principles (which would
obviously need defining on the wiki)

Then it becomes the responsibility of the author(s) and committer(s) of
that PR to ensure that the code and the PR follows the project's criteria
for code quality and test coverage. It also becomes easier to measure the
PRs of a non-committer to determine if we think they would make a good
committer (for example, do they adhere to clean code quality and unit
testing with mocks? -- along with any other criteria).

On Thu, May 30, 2019 at 3:51 PM Owen Nichols  wrote:

> It seems common for Geode PRs to get merged with only a single green
> checkmark in GitHub.
>
> According to https://www.apache.org/foundation/voting.html we should not
> be merging PRs with fewer than 3 green checkmarks.
>
> Consensus is a fundamental value in doing things The Apache Way.  A single
> +1 is not consensus.  Since we’re currently discussing what it takes to
> become a committer and what standards a committer is expected to uphold, it
> seems like a good time to review this policy.
>
> GitHub can be configured to require N reviews before a commit can be
> merged.  Should we enable this feature?
>
> -Owen
> VOTES ON CODE MODIFICATION <
> https://www.apache.org/foundation/voting.html#votes-on-code-modification>
> For code-modification votes, +1 votes are in favour of the proposal, but
> -1 votes are vetos 
> and kill the proposal dead until all vetoers withdraw their -1 votes.
>
> Unless a vote has been declared as using lazy consensus <
> https://www.apache.org/foundation/voting.html#LazyConsensus> , three +1
> votes are required for a code-modification proposal to pass.
>
> Whole numbers are recommended for this type of vote, as the opinion being
> expressed is Boolean: 'I approve/do not approve of this change.'
>
>
> CONSENSUS GAUGING THROUGH SILENCE <
> https://www.apache.org/foundation/voting.html#LazyConsensus>
> An alternative to voting that is sometimes used to measure the
> acceptability of something is the concept of lazy consensus <
> https://www.apache.org/foundation/glossary.html#LazyConsensus>.
>
> Lazy consensus is simply an announcement of 'silence gives assent.’ When
> someone wants to determine the sense of the community this way, it might do
> so with a mail message 

Re: About operating system statistics

2019-05-14 Thread Alberto Bustamante Reyes
pull request available: https://github.com/apache/geode/pull/3574

[https://avatars3.githubusercontent.com/u/47359?s=400=4]<https://github.com/apache/geode/pull/3574>

GEODE-6660: Remove non-Linux OS statistics classes by alb3rtobr · Pull Request 
#3574 · apache/geode<https://github.com/apache/geode/pull/3574>
github.com
Thank you for submitting a contribution to Apache Geode. In order to streamline 
the review of the contribution we ask you to ensure the following steps have 
been taken: For all changes: Is there...



____
De: Alberto Bustamante Reyes 
Enviado: martes, 16 de abril de 2019 15:21:34
Para: dev@geode.apache.org
Asunto: RE: About operating system statistics

Thanks for your answer Darrel. I have created a jira to delete these classes.

BR/

Alberto

De: Darrel Schneider 
Enviado: lunes, 15 de abril de 2019 17:12
Para: dev@geode.apache.org
Asunto: Re: About operating system statistics

The code you have found predates the open source geode project. That code
requires a native, platform dependent, jar to be distributed. We decided
that geode would be pure java. The code that collects Linux stats is pure
java. So I think the code you have found for Solaris, Windows, and OSX has
been "abandoned" and can be deleted.


On Mon, Apr 15, 2019 at 6:54 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> I have read in the documentation that the operating system statistics are
> only available for Linux systems, but I have found in the code the
> correspondent classes for OSX, Solaris & Windows stats (package
> org.apache.geode.internal.statistics.platform).
>
> Are these classes being used? Should they be included in the documentation
> or deleted? (the OSX ones are almost empty, so Im not sure if someone is
> working on it or its an 'abandonded feature')
>
> Thanks in advance,
>
> BR/
>
> Alberto
>


RE: Issue with full disk store directories

2019-05-06 Thread Alberto Bustamante Reyes
Not sure if I understand your question: in the test I did, I used one disk 
store composed by three directories, each one with different size. These 
directories were in the same disk partition. The issue I saw is that when the 
log files are initialized, it is not checked if they fit in the directory, so 
if the maximum directory sized is reached, the server crashes.

De: Anthony Baker 
Enviado: lunes, 29 de abril de 2019 17:24
Para: dev@geode.apache.org
Asunto: Re: Issue with full disk store directories

Question:  are you using similarly sized disk partitioned for all your disk 
stores?

> On Apr 24, 2019, at 3:42 AM, Alberto Bustamante Reyes 
>  wrote:
>
> Hi all,
>
> I reported an issue in Jira, related with full disk store directories: 
> https://issues.apache.org/jira/browse/GEODE-6652
> As I describe there, the issue is that when using a disk store with 
> directories of different sizes, when oplog files rotate, the available space 
> of the next disk store directory to be used seems not to be checked correctly.
>
> BR/
>
> Alberto
>
>



Issue with full disk store directories

2019-04-24 Thread Alberto Bustamante Reyes
Hi all,

I reported an issue in Jira, related with full disk store directories: 
https://issues.apache.org/jira/browse/GEODE-6652
As I describe there, the issue is that when using a disk store with directories 
of different sizes, when oplog files rotate, the available space of the next 
disk store directory to be used seems not to be checked correctly.

BR/

Alberto




How to publish client stats on server

2019-04-16 Thread Alberto Bustamante Reyes
Hi Geode community,

Im trying to run a simple test to check how the client stats are published on 
the server, but I have not been able to do it.

The server is started with the statistic sampling enabled, and in the client I 
set the sending interval with setPoolStatisticInterval, but when I open the 
stats file with VSD, I cannot see any "ClientStat" there. I was expecting to 
see the stats "Client-to-Server Messaging Performance (ClientStats)".

I have checked the code and these two actions (setting time interval and stats 
sampling) are the two conditions that enable the publishing of the client 
stats, if Im not wrong.

What am I missing? Thanks in advance.

BR/

Alberto


RE: About operating system statistics

2019-04-16 Thread Alberto Bustamante Reyes
Thanks for your answer Darrel. I have created a jira to delete these classes.

BR/

Alberto

De: Darrel Schneider 
Enviado: lunes, 15 de abril de 2019 17:12
Para: dev@geode.apache.org
Asunto: Re: About operating system statistics

The code you have found predates the open source geode project. That code
requires a native, platform dependent, jar to be distributed. We decided
that geode would be pure java. The code that collects Linux stats is pure
java. So I think the code you have found for Solaris, Windows, and OSX has
been "abandoned" and can be deleted.


On Mon, Apr 15, 2019 at 6:54 AM Alberto Bustamante Reyes
 wrote:

> Hi all,
>
> I have read in the documentation that the operating system statistics are
> only available for Linux systems, but I have found in the code the
> correspondent classes for OSX, Solaris & Windows stats (package
> org.apache.geode.internal.statistics.platform).
>
> Are these classes being used? Should they be included in the documentation
> or deleted? (the OSX ones are almost empty, so Im not sure if someone is
> working on it or its an 'abandonded feature')
>
> Thanks in advance,
>
> BR/
>
> Alberto
>


About operating system statistics

2019-04-15 Thread Alberto Bustamante Reyes
Hi all,

I have read in the documentation that the operating system statistics are only 
available for Linux systems, but I have found in the code the correspondent 
classes for OSX, Solaris & Windows stats (package 
org.apache.geode.internal.statistics.platform).

Are these classes being used? Should they be included in the documentation or 
deleted? (the OSX ones are almost empty, so Im not sure if someone is working 
on it or its an 'abandonded feature')

Thanks in advance,

BR/

Alberto


Jira & wiki permissions

2019-04-10 Thread Alberto Bustamante Reyes
Hi Geode community,


I would like to have permissions to edit the wiki and assign me tickets. Could 
someone help me with this?

My username is alberto.bustamante.reyes


Thanks in advance,


Alberto