Re: Release manager permissions

2022-09-27 Thread Alberto Gomez
Hi,

Do you know if any company has offered to sponsor the CI pipelines? What would 
it take for such a company besides paying the bills? Would a migration be 
needed?

Regarding the old ASF Jenkins jobs, my understanding is that they would offer 
the same CI functionality as we have today, but they would be run on ASF 
provided resources which would most likely make the time to get results longer 
and less predictable. Is that correct?


Thanks,

Alberto

From: Anthony Baker 
Sent: Friday, September 23, 2022 8:15 PM
To: dev@geode.apache.org 
Subject: Re: Release manager permissions

Just a reminder to all: we need to find an alternative to the VMware-sponsored 
CI pipelines currently in use. Any ideas? Should we try to resurrect the old 
ASF Jenkins jobs?

Anthony

> On Sep 23, 2022, at 3:26 AM, Mario Kevo  wrote:
>
> ⚠ External Email
>
> Hi devs,
>
> I need the following permissions for the release manager:
>
>  *   bulk modification permission on Apache Geode JIRA
>  *   permission to deploy pipelines to Geode CI
>  *   Docker Hub credentials with permission to upload Apache Geode to Docker 
> Hub
>
> username: mkevo
> mail: mk...@apache.org
>
> Can someone give me these permissions, so I can start building a new patch 
> release?
>
> Thanks and BR,
> Mario
>
> 
>
> ⚠ External Email: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender.



Re: Apache Geode 1.15.1 patch version

2022-09-16 Thread Alberto Gomez
I had to add a couple more along the way:

- Support Jammy (ubuntu 22.04) https://issues.apache.org/jira/browse/GEODE-10291
- Handle WAN event when interrupted 
https://issues.apache.org/jira/browse/GEODE-10420 (pending issue from 
https://issues.apache.org/jira/browse/GEODE-10403)


BR,

Alberto

From: Alberto Gomez 
Sent: Friday, September 16, 2022 11:20 AM
To: dev@geode.apache.org 
Subject: Re: Apache Geode 1.15.1 patch version

Jakov and I will take care of it.

Regards,

Alberto

From: Anthony Baker 
Sent: Thursday, September 15, 2022 4:57 PM
To: dev@geode.apache.org 
Subject: Re: Apache Geode 1.15.1 patch version

Sounds good to me. Who will back port the suggested issues to the support/1.15 
branch?

Anthony


> On Sep 15, 2022, at 5:11 AM, Alberto Gomez  wrote:
>
> ⚠ External Email
>
> Hi,
>
> One more I forgot related to: "Fix string codepoint detection 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10076&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=to8Ifb%2BTy8Jofyt7TcW49FjKXviXQrN%2B1yeOKUjFfV4%3D&reserved=0":
>
>
> GEODE-10404 Fix compilation for Java 11:  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10404&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pdz7uHjuYY%2Fcj1ph%2BiD6onkZvd8ll%2BdNnX1W0CH%2Fsgw%3D&reserved=0
>
> BR,
>
> Alberto
> 
> From: Alberto Gomez 
> Sent: Thursday, September 15, 2022 11:33 AM
> To: dev@geode.apache.org 
> Subject: Re: Apache Geode 1.15.1 patch version
>
> Hi community,
>
> I propose to add the following PRs to this patch release:
>
> [Bug] 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
> GEODE-10417<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
>
> Fix NullPointerException when getting events from the gw sender queue to 
> complete 
> transactions<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
>
> [Bug] 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=us7XzybMqqgK8CyB8IuuyON%2BFnwzSur3kVUzSzwBZho%3D&reserved=0>
> GEODE-10403<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188408298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G1jP%2FNGf1AUl0TERdbfrjJobtdAh3P2U9Yq1whZsp9w%3D&reserved=0>
>
> Distributed deadlock when stopping gateway 
> sender<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188408298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi

Re: Apache Geode 1.15.1 patch version

2022-09-16 Thread Alberto Gomez
Jakov and I will take care of it.

Regards,

Alberto

From: Anthony Baker 
Sent: Thursday, September 15, 2022 4:57 PM
To: dev@geode.apache.org 
Subject: Re: Apache Geode 1.15.1 patch version

Sounds good to me. Who will back port the suggested issues to the support/1.15 
branch?

Anthony


> On Sep 15, 2022, at 5:11 AM, Alberto Gomez  wrote:
>
> ⚠ External Email
>
> Hi,
>
> One more I forgot related to: "Fix string codepoint detection 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10076&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=to8Ifb%2BTy8Jofyt7TcW49FjKXviXQrN%2B1yeOKUjFfV4%3D&reserved=0":
>
>
> GEODE-10404 Fix compilation for Java 11:  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10404&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pdz7uHjuYY%2Fcj1ph%2BiD6onkZvd8ll%2BdNnX1W0CH%2Fsgw%3D&reserved=0
>
> BR,
>
> Alberto
> 
> From: Alberto Gomez 
> Sent: Thursday, September 15, 2022 11:33 AM
> To: dev@geode.apache.org 
> Subject: Re: Apache Geode 1.15.1 patch version
>
> Hi community,
>
> I propose to add the following PRs to this patch release:
>
> [Bug] 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
> GEODE-10417<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
>
> Fix NullPointerException when getting events from the gw sender queue to 
> complete 
> transactions<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10417&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQ2hamuTdAJnQlaB7%2Bn5i6NXggX6gbx4za257huOMR8%3D&reserved=0>
>
> [Bug] 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188252081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=us7XzybMqqgK8CyB8IuuyON%2BFnwzSur3kVUzSzwBZho%3D&reserved=0>
> GEODE-10403<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188408298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G1jP%2FNGf1AUl0TERdbfrjJobtdAh3P2U9Yq1whZsp9w%3D&reserved=0>
>
> Distributed deadlock when stopping gateway 
> sender<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10403&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188408298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G1jP%2FNGf1AUl0TERdbfrjJobtdAh3P2U9Yq1whZsp9w%3D&reserved=0>
>
> [Improvement] 
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10371&data=05%7C01%7Cbakera%40vmware.com%7Ce8179323a2ac46241d7e08da97137b34%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637988407188408298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC

Re: Apache Geode 1.15.1 patch version

2022-09-15 Thread Alberto Gomez
Hi,

One more I forgot related to: "Fix string codepoint detection 
https://issues.apache.org/jira/browse/GEODE-10076":


GEODE-10404 Fix compilation for Java 11:  
https://issues.apache.org/jira/browse/GEODE-10404

BR,

Alberto
____
From: Alberto Gomez 
Sent: Thursday, September 15, 2022 11:33 AM
To: dev@geode.apache.org 
Subject: Re: Apache Geode 1.15.1 patch version

Hi community,

I propose to add the following PRs to this patch release:

[Bug] <https://issues.apache.org/jira/browse/GEODE-10417>
GEODE-10417<https://issues.apache.org/jira/browse/GEODE-10417>

Fix NullPointerException when getting events from the gw sender queue to 
complete transactions<https://issues.apache.org/jira/browse/GEODE-10417>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10403>
GEODE-10403<https://issues.apache.org/jira/browse/GEODE-10403>

Distributed deadlock when stopping gateway 
sender<https://issues.apache.org/jira/browse/GEODE-10403>

[Improvement] <https://issues.apache.org/jira/browse/GEODE-10371>
GEODE-10371<https://issues.apache.org/jira/browse/GEODE-10371>

C++ Native client: Improve dispersion on connections 
expiration<https://issues.apache.org/jira/browse/GEODE-10371>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10352>
GEODE-10352<https://issues.apache.org/jira/browse/GEODE-10352>

Update Dockerfile to use Ruby >= 2.6 in the tool to preview Geode 
documentation<https://issues.apache.org/jira/browse/GEODE-10352>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10348>
GEODE-10348<https://issues.apache.org/jira/browse/GEODE-10348>

Correct documentation about 
conflation<https://issues.apache.org/jira/browse/GEODE-10348>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10346>
GEODE-10346<https://issues.apache.org/jira/browse/GEODE-10346>

Correct batch-time-interval description in 
documentation<https://issues.apache.org/jira/browse/GEODE-10346>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10323>
GEODE-10323<https://issues.apache.org/jira/browse/GEODE-10323>

OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: 
expected:<100> but 
was:<1048576><https://issues.apache.org/jira/browse/GEODE-10323>

[Bug] <https://issues.apache.org/jira/browse/GEODE-10155>
GEODE-10155<https://issues.apache.org/jira/browse/GEODE-10155>

ServerConnection thread hangs when client function execution 
times-out<https://issues.apache.org/jira/browse/GEODE-10155>

[Improvement] <https://issues.apache.org/jira/browse/GEODE-10076>
GEODE-10076<https://issues.apache.org/jira/browse/GEODE-10076>

Fix string codepoint 
detection<https://issues.apache.org/jira/browse/GEODE-10076>

BR,

Alberto

From: Anthony Baker 
Sent: Friday, September 9, 2022 8:14 PM
To: dev@geode.apache.org 
Cc: Weijie Xu M 
Subject: Re: Apache Geode 1.15.1 patch version

Thanks Mario. I removed some entries from the list that didn’t seem relevant to 
a small patch release. I think previously Xu Weijie volunteered to look at 
https://issues.apache.org/jira/browse/GEODE-10415.

Anthony


On Sep 8, 2022, at 11:20 PM, Mario Kevo 
mailto:mario.k...@est.tech>> wrote:

⚠ External Email

Hi all,

I'm going to build a new patch version of the Geode.
There is a list of tasks that are declared to be fixed in 1.15.1. As they are 
already assigned, please can the assignee provide a fix for this so we can move 
on? 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FGEODE%2Fversions%2F12351801&data=05%7C01%7Cbakera%40vmware.com%7Cb682687cff424bd66c4f08da922b68c9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637983012385132654%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zyj0QD8xHMKWlB92DRlVsZg97ay4Rszqlist8Nut5J0%3D&reserved=0

Also, there is one blocker that will be good to introduce to this release, if 
it is okay for all of you. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10415&data=05%7C01%7Cbakera%40vmware.com%7Cb682687cff424bd66c4f08da922b68c9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637983012385289006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lhY3UUTY36WFsktv5hjhIH31I7gJW0F94ipJL0ZgKYU%3D&reserved=0

Please suggest if you have some more tickets that are critical and should be 
backported to this release, so we can get an opinion of the community on that 
before releasing the new version.

Thanks and BR,
Mario




⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.



Re: Apache Geode 1.15.1 patch version

2022-09-15 Thread Alberto Gomez
Hi community,

I propose to add the following PRs to this patch release:

[Bug] 
GEODE-10417

Fix NullPointerException when getting events from the gw sender queue to 
complete transactions

[Bug] 
GEODE-10403

Distributed deadlock when stopping gateway 
sender

[Improvement] 
GEODE-10371

C++ Native client: Improve dispersion on connections 
expiration

[Bug] 
GEODE-10352

Update Dockerfile to use Ruby >= 2.6 in the tool to preview Geode 
documentation

[Bug] 
GEODE-10348

Correct documentation about 
conflation

[Bug] 
GEODE-10346

Correct batch-time-interval description in 
documentation

[Bug] 
GEODE-10323

OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: 
expected:<100> but 
was:<1048576>

[Bug] 
GEODE-10155

ServerConnection thread hangs when client function execution 
times-out

[Improvement] 
GEODE-10076

Fix string codepoint 
detection

BR,

Alberto

From: Anthony Baker 
Sent: Friday, September 9, 2022 8:14 PM
To: dev@geode.apache.org 
Cc: Weijie Xu M 
Subject: Re: Apache Geode 1.15.1 patch version

Thanks Mario. I removed some entries from the list that didn’t seem relevant to 
a small patch release. I think previously Xu Weijie volunteered to look at 
https://issues.apache.org/jira/browse/GEODE-10415.

Anthony


On Sep 8, 2022, at 11:20 PM, Mario Kevo 
mailto:mario.k...@est.tech>> wrote:

⚠ External Email

Hi all,

I'm going to build a new patch version of the Geode.
There is a list of tasks that are declared to be fixed in 1.15.1. As they are 
already assigned, please can the assignee provide a fix for this so we can move 
on? 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FGEODE%2Fversions%2F12351801&data=05%7C01%7Cbakera%40vmware.com%7Cb682687cff424bd66c4f08da922b68c9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637983012385132654%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zyj0QD8xHMKWlB92DRlVsZg97ay4Rszqlist8Nut5J0%3D&reserved=0

Also, there is one blocker that will be good to introduce to this release, if 
it is okay for all of you. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-10415&data=05%7C01%7Cbakera%40vmware.com%7Cb682687cff424bd66c4f08da922b68c9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637983012385289006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lhY3UUTY36WFsktv5hjhIH31I7gJW0F94ipJL0ZgKYU%3D&reserved=0

Please suggest if you have some more tickets that are critical and should be 
backported to this release, so we can get an opinion of the community on that 
before releasing the new version.

Thanks and BR,
Mario




⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.



Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo

2022-06-17 Thread Alberto Gomez
Hi Dave,

Supposing we move the documentation out of the geode code repo, if I download a 
certain release of Geode, how do I know which version of the documentation I 
must download which will be consistent with the code?

Having both the docs and the code in the same repo makes the above question a 
no-brainer. But if code and documentation do not go hand by hand, how will we 
know?

Alberto

From: Dave Barnes 
Sent: Wednesday, June 15, 2022 11:06 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo

Adopting a policy that allows changes to doc sources after code freeze
would address my primary complaint about the present system.
Updating the User Guide at the time a user-visible code change is
implemented is a helpful step toward keeping the docs up-to-date with the
code, but is not sufficient.
Above and beyond individual enhancements, the user guide addresses topics
such as system configuration, upgrades, and the like. Such higher-level
topics are often modified asynchronously from code releases, as are typo
and format repairs. For such asynchronous updates, the fact that the doc
sources are located in the source repo is of little consequence. In fact,
separate repos would allow separate revision histories, an advantage to
both.
One more consideration is the possibility of breaking the monolithic user
guide into smaller separate publications, such as an installation guide,
system management/administration guide, developer's guide, advanced topics,
etc. A change like this would be easier if it started from a docs-only repo.
Did any of those thoughts change anyone's mind?

On Wed, Jun 15, 2022 at 12:29 AM Owen Nichols 
wrote:

> The Geode project comprises several repos already, include geode,
> geode-examples, geode-benchmarks, geode-native, and geode-kafka-connector,
> and geode-site, so it’s not unreasonable to add another.  However, we still
> release all at the same time, so any “code freeze” applies equally to all
> geode repos.
>
> Conceptually, “code freeze” applies to *code we ship*.  Test-only or
> docs-only commits are welcome anytime. Actually, any commits are welcome at
> any time -- “freeze” just means the branch is tagged at the point in time
> the release manager creates RC1; any commits after that tag will simply
> become part of a future release (in the event we go to RC2, post-freeze
> commits may or may not be pulled into the current release, at the release
> manager’s discretion).
>
> Although the User Guide source files are currently part of the Geode
> source release, most users probably find the published website [1] more
> convenient.  In my opinion, it should be fine to publish improvements to
> the doc site post-release (taking care to exclude commits related to
> unreleased new features, if any)...would that resolve the issue?
>
> > examples and usage guidelines can be finalized only AFTER the code, with
> all its version numbers, naming conventions, etc, are in place.
> Chasing a moving target is definitely be frustrating; luckily there are
> things we can all do to minimize it.  I’ve seen many PRs that update the
> docs at the same time as they change the product -- reviewers should check
> for this when reviewing any PR that affects a public API, config setting,
> etc.  We also cut the support branch well in advance of planned release
> date and limit changes on the support branch to critical fixes only.
> Whenever necessary, anyone should feel free to file blocker tickets for
> missing/incorrect docs to ensure the release does not ship prematurely
> without meeting Geode’s standard of documentation.
> [1] https://geode.apache.org/docs/
>
> From: Dave Barnes 
> Date: Tuesday, June 14, 2022 at 3:11 PM
> To: jb...@vmware.com.invalid 
> Cc: dev@geode.apache.org 
> Subject: Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo
> ⚠ External Email
>
> John,
> Thanks for acknowledging that docs are just as important as code!  As a
> career tech-writer, the "docs=code" model appeals to me.
> I get what you're saying, and have worked in environments where release
> managers have enforced such constraints.
> In this vein, the Geode code is well-supplied with embedded Javadoc
> comments that behave exactly as you describe, providing a reference that is
> updated as the code is updated.
> The difference with a user guide (as opposed to reference material), is
> that examples and usage guidelines can be finalized only AFTER the code,
> with all its version numbers, naming conventions, etc, are in place.
> Delaying code freeze until docs are complete, in my experience, engenders
> feature-creep and introduces delays, often resulting in compromises that
> allow the release to go out with mis-matched docs. IMO, a separate
> user-guide repo promotes a better and more timely match-up between the
> software and the user guide.
>
>
> On Tue, Jun 14, 2022 at 1:15 PM John Blum 
> wrote:
>
> > Persona

Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo

2022-06-14 Thread Alberto Gomez
Hi,

I agree with Udo and John that having the docs and the code in the same repo 
really helps to have both in-sync. Therefore, I would not separate them in 
different repos.

I'd rather see a change in the process to overcome the difficulties faced with 
the documentation after the code is frozen.

Alberto

From: Udo Kohlmeyer 
Sent: Wednesday, June 15, 2022 5:05 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo

Hi there Dave,

I can understand the frustration that you face. I think the freezing of the 
code is different to that of the docs. I think each project member would agree 
if I stated that changes to the docs on ANY branch should be allowed regardless 
of where in the process of the release the project finds itself. (within common 
sense reasoning of course 😉 )

I am however interested in how we would ensure that the docs repo and the code 
repo, stay “in-sync”? Would we raise JIRA’s (Github issues) with the repo to 
make sure that we don’t miss documenting features or changes? We already suffer 
the problem where feature/changes are made and merged without sufficient docs 
changes. It feels like moving docs to their own repo would move a existing 
problem further away.

I understand that moving the docs to another repo, would enable some form of 
autonomy, but I believe that John might have a point, this feels very much like 
a process problem.

Would it help, if docs have a “special pass” that allows doc modifications to 
happen at any point on any branch, if the changes made relate to actual changes 
that have been completed on the branch? (to avoid docs changes that are out of 
sequence with the deliverable)

--Udo

From: Dave Barnes 
Date: Wednesday, June 15, 2022 at 8:11 AM
To: jb...@vmware.com.invalid 
Cc: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo
⚠ External Email

John,
Thanks for acknowledging that docs are just as important as code!  As a
career tech-writer, the "docs=code" model appeals to me.
I get what you're saying, and have worked in environments where release
managers have enforced such constraints.
In this vein, the Geode code is well-supplied with embedded Javadoc
comments that behave exactly as you describe, providing a reference that is
updated as the code is updated.
The difference with a user guide (as opposed to reference material), is
that examples and usage guidelines can be finalized only AFTER the code,
with all its version numbers, naming conventions, etc, are in place.
Delaying code freeze until docs are complete, in my experience, engenders
feature-creep and introduces delays, often resulting in compromises that
allow the release to go out with mis-matched docs. IMO, a separate
user-guide repo promotes a better and more timely match-up between the
software and the user guide.


On Tue, Jun 14, 2022 at 1:15 PM John Blum  wrote:

> Personally, I believe doc is a critical component to any software project,
> especially a project like Apache Geode, and so, is the project really
> “complete “(or should thee codebase  really be frozen during a release) if
> the doc is not done or consistent yet?
>
> Having the doc be part of the source allows the doc to be (theoretically)
> in-sync with the codebase as it evolves, as it should be. On the other
> hand, with a separate repo, it does allow corrections or other alterations
> to be made at the risk of growing inconsistency, which is a huge impediment
> IMO. In Asciidoc, doc can even be based on the source in part (e.g.
> interfaces).
>
> Ideally, I don’t see code and doc being separate or even fundamentally
> different.
>
> This sounds more like a process problem and a workaround to a broken
> process, to me.
>
> $0.02
> -John
>
>
> From: Dave Barnes 
> Date: Tuesday, June 14, 2022 at 12:15 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] Relocate Geode Docs from code repo to seperate repo
> ⚠ External Email
>
> I'd like to move the doc sources for the Geode User Guide from the code
> repo (
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode&data=05%7C01%7Cudo%40vmware.com%7Ce50d39b82ffd499d21d708da4e52c36b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637908414625879240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3SKtmt7VI2tBelJRj48xqtW5x%2F9hYNzqFVX9NJe9IC0%3D&reserved=0)
> to a separate geode-docs repo.
>
> The primary reason is to allow docs to cycle at their own rate, rather than
> in lock-step with the code. The present arrangement causes problems during
> releases, when code is frozen (including doc sources) to prepare a release
> candidate. This is exactly the time when critical last-minute doc changes
> are needed, but such changes are forbidden due to the code freeze.
>
> I have participated in the Geode project since its inception, and can
> confidently state 

Re: [DISCUSS] Alignment of values disabling idleTimeout/loadConditioningInterval between Geode client APIs

2022-06-14 Thread Alberto Gomez
Thanks for your answer, Darrel.

If the breaking change is not a viable option, how about at least having both 
clients agree on the -1 value (currently -1 is not supported by the native 
client) to mean never idle expire and never load condition respectively?

It is true that they will not agree on the 0 value but at least they would 
agree on -1.

Not sure if this compromise change will really be for the better.

Alberto

From: Darrel Schneider 
Sent: Monday, June 13, 2022 10:03 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Alignment of values disabling 
idleTimeout/loadConditioningInterval between Geode client APIs

My concern is you are proposing to change the behavior of an existing geode 
feature. I think 0 is currently supported for both these properties in the Java 
client. I would think they cause immediate idle expiration and a very hot load 
conditioning. Your proposal would make 0 mean something very different (never 
idle expire and never load condition).
Also since both of these properties express a time duration, interpreting the 
value 0 as a very short duration is the natural meaning. Treating as an 
infinite duration takes some explanation.
If the native client currently does not support setting these properties to -1 
then you could more safely change it to treat -1 as an infinite duration like 
the Java client does.
So at least both clients would agree on what -1 means.
But they would still disagree on what 0 means. To bring them into agreement you 
need to make a breaking change in either the native client or the java client. 
It might be best just to document this difference. I think it might be a small 
subset of our user base that thinks everything on the native client will be 
consistent with the Java client. Or vice versa. I think most users of one 
client do not even use the other client.


From: Alberto Gomez 
Sent: Monday, June 13, 2022 8:20 AM
To: dev@geode.apache.org 
Subject: [DISCUSS] Alignment of values disabling 
idleTimeout/loadConditioningInterval between Geode client APIs

⚠ External Email

Hi,

According to the documentation of the Geode Java client API, setting -1 for 
idleTimeout in a Pool indicates that connections should never expire:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fclient%2FPoolFactory.html%23setIdleTimeout-long-&data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9Lw0czqIngj47SoJM5hJZJxojAcJjqh3OPYescN4Ctg%3D&reserved=0

Nevertheless, according to the documentation of the Geode Native client API, 
setting a duration of std::chrono::milliseconds::zero() for idleTimeout 
indicates that connections should never expire.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fcppdocs%2Fa00799.html%23ae5fef0b20d95f11399a1fa66f90fbc74&data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pOzqVhNF1WRj3XYqvM%2BUCcwSTLRc94xaCclBvr5X1t8%3D&reserved=0

A similar discrepancy between the two clients can be observed for the 
loadConditioningInterval setting:

According to the documentation of the Java client API, A value of -1 disables 
load conditioning:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fclient%2FPoolFactory.html%23setLoadConditioningInterval-int-&data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bbn6PVstkE9P6ZOcVUZW4%2FJTXrDEIY1UMNJEzq9Oq40%3D&reserved=0

Nevertheless, according to the documentation of the Geode Native client API, 
setting a value of std::chrono::milliseconds::zero() disables load conditioning.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fcppdocs%2Fa00799.html%23aaa812743d8458017bdbb8afa144c05e7&data=05%7C01%7Cdarrel%40vmware.com%7C8d31f0e0512b45f215da08da4d5099a9%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637907305827807591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zUQDeM%2FkwiTjG9WzHp0t3FPqPt48HuFkYvTpLDnSyWk%3D&reserved=0

This discrepancy can create confusion and lead to a misuse of the client APIs 
which can provoke an unexpected behavior in the client.

Geode

[DISCUSS] Alignment of values disabling idleTimeout/loadConditioningInterval between Geode client APIs

2022-06-13 Thread Alberto Gomez
Hi,

According to the documentation of the Geode Java client API, setting -1 for 
idleTimeout in a Pool indicates that connections should never expire:
https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/client/PoolFactory.html#setIdleTimeout-long-

Nevertheless, according to the documentation of the Geode Native client API, 
setting a duration of std::chrono::milliseconds::zero() for idleTimeout 
indicates that connections should never expire.
https://geode.apache.org/releases/latest/cppdocs/a00799.html#ae5fef0b20d95f11399a1fa66f90fbc74

A similar discrepancy between the two clients can be observed for the 
loadConditioningInterval setting:

According to the documentation of the Java client API, A value of -1 disables 
load conditioning:
https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/client/PoolFactory.html#setLoadConditioningInterval-int-

Nevertheless, according to the documentation of the Geode Native client API, 
setting a value of std::chrono::milliseconds::zero() disables load conditioning.
https://geode.apache.org/releases/latest/cppdocs/a00799.html#aaa812743d8458017bdbb8afa144c05e7

This discrepancy can create confusion and lead to a misuse of the client APIs 
which can provoke an unexpected behavior in the client.

Geode API clients should be consistent to avoid these types of problems. 
Therefore, I propose to align both client APIs to use the same values for 
disabling the timing out of connections due to idle-timeout and 
load-conditioning.

Usually, alignment of the Java and native client consists of copying the 
behavior of the Java client into the C++ client.
In this case, nevertheless, I think it makes more sense to take as the model 
the behavior of the native client.
The reason is that I do not think it makes sense to support an idle-timeout and 
a load-conditioning-interval of 0 ms. Consequently, I think a logical value to 
disable both could be 0 ms.

Any thoughts on this proposal?

Thanks,

Alberto




Re: [PROPOSAL] RFC Interruption of threads stuck for a long time in Geode

2022-04-28 Thread Alberto Gomez
Hi,

The deadline for this RFC has expired but I would like to get some more 
feedback on the proposal, so I will extend it a bit more.

Your comments are very welcome.

Alberto



From: Alberto Gomez 
Sent: Wednesday, April 6, 2022 10:11 AM
To: dev@geode.apache.org 
Subject: [PROPOSAL] RFC Interruption of threads stuck for a long time in Geode

Hi,

I'd appreciate your feedback on this newly published RFC about "Interruption of 
threads stuck for a long time in Geode":

https://cwiki.apache.org/confluence/display/GEODE/Interruption+of+threads+stuck+for+a+long+time+in+Geode

Thanks in advance,

Alberto


Re: [PROPOSAL] Remove warning logs from FunctionException

2022-04-28 Thread Alberto Gomez
Hi Barry,

If the exception is returned by passing it to the ResultCollector's 
sendException() method then the exception is not logged. If the exception is 
returned by passing it to the lastResult() method then the exception (and the 
stack trace) is logged. I am assuming that when you say that the exception is 
returned in its result is done by means of the sendException() method.

I agree with you that Geode must be consistent and if an exception is thrown by 
the function, then the exception should be logged no matter if isHA is 
returning false or true. Like you, I have also observed that when isHA is 
returning false the exception is not logged.

I also think it is worth to at least make this logging of the exception 
configurable for those cases where functions prefer to throw the exception 
instead of sending it and still do not want to see those exceptions logged.

Thanks,

Alberto

From: Barry Oglesby 
Sent: Thursday, April 28, 2022 2:32 AM
To: Alberto Gomez ; dev@geode.apache.org 

Subject: Re: [PROPOSAL] Remove warning logs from FunctionException

A function can throw an exception and can also return an exception in its 
result. I'm not sure I've seen too many functions where throwing an exception 
is the expected result. In my very quick testing, I see the exception and stack 
logged if the exception is thrown by the function but not if the exception is 
returned. Are you seeing that same behavior, or are both cases logging the 
exception? I also see the behavior you described where isHA returning false 
does not log the exception. I guess I would say if an exception is thrown in 
either case, it should be logged. If it is returned in the result, it shouldn't.

________
From: Alberto Gomez 
Sent: Tuesday, April 5, 2022 4:03 AM
To: dev@geode.apache.org ; Barry Oglesby 

Subject: Re: [PROPOSAL] Remove warning logs from FunctionException


⚠ External Email

Thanks for your proposal, Juan.

I still think that it makes sense to remove these warning logs altogether. Even 
if the stack trace is removed, the amount of logs could still be huge if the 
operations received is high and the percentage of exceptions significant.

One more factor to add to this discussion is that these warning logs are only 
generated if the function is HA. If the function returns false to isHA() then 
the log does not appear.

I would say this is one more reason in favor of removing these logs so that the 
system is consistent.

@Barrett Oglesby<mailto:bogle...@vmware.com> are you still in favor of keeping 
these warning logs?

More opinions on this topic are very welcome in order to be able to decide.

Thanks,

Alberto

From: Ju@N 
Sent: Wednesday, March 30, 2022 7:04 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Remove warning logs from FunctionException

Hello all,

What about something in the middle?: log a WARNING level message stating
that the Function named XXX failed and also log the details (including the
stack trace) using DEBUG log level?. This would reduce the amount of logs
for functions that fail frequently, and will also allow the person
troubleshooting/debugging issues to easily tell that something is wrong
with function XXX.
Cheers.



On Wed, 30 Mar 2022 at 17:52, Jacob Barrett  wrote:

>
>
> On Mar 30, 2022, at 9:45 AM, Alberto Gomez  alberto.go...@est.tech>> wrote:
>
> The idea would not be to remove the logs but rather to change the level of
> these logs from warning to debug level.
>
> I agree, if any exception is expected as part user provided action it
> should not produce verbose logging.
>
> -Jake
>
>

--
Ju@N



⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: On conserve-sockets=true with WAN and/or transactions - Follow-up on April's Geode Community Meeting

2022-04-20 Thread Alberto Gomez
Thanks a lot for the information, Barry.

Alberto

From: Barry Oglesby 
Sent: Friday, April 15, 2022 7:42 PM
To: dev@geode.apache.org ; u...@geode.apache.org 

Subject: Re: On conserve-sockets=true with WAN and/or transactions - Follow-up 
on April's Geode Community Meeting

Alberto,

I can only speak to the WAN question in your email. The conserve-sockets 
setting was (or is) a limitation on serial WAN, but I just ran a few tests, and 
it is not deadlocking. Its been a while since I've tried serial WAN with 
conserve-sockets=true, but I'm pretty sure a test with several servers in each 
site and a multi-threaded client doing puts would cause the deadlock. That is 
not happening in my tests. We would need way more than a few simple tests to 
prove that it doesn't deadlock in other scenarios, though.

Barry
________
From: Alberto Gomez 
Sent: Friday, April 8, 2022 4:17 AM
To: dev@geode.apache.org ; u...@geode.apache.org 

Subject: On conserve-sockets=true with WAN and/or transactions - Follow-up on 
April's Geode Community Meeting

⚠ External Email

Hi,

Following up on the discussion we had yesterday in the Apache Geode Community 
meeting around the "Reflections on conserve-sockets setting in Apache Geode" 
topic, I'd like to post here some questions that could not be fully answered 
during the meeting:

The Geode documentation states the following about conserve-sockets and WAN 
deployments in [1]:
"WAN deployments increase the messaging demands on a Geode system. To avoid 
hangs related to WAN messaging, always set `conserve-sockets=false` for Geode 
members that participate in a WAN deployment."

It also states the following about conserve-sockets and transactions in [2]:
"When you have transactions operating on EMPTY, NORMAL or PARTITION regions, 
make sure that conserve-sockets is set to false to avoid distributed deadlocks."

Doing a search on the Geode tests, the only test case related to deadlocks with 
conserve-sockets=true that I have found is:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2F41eb49989f25607acfcbf9ac5afe3d4c0721bb35%2Fgeode-wan%2Fsrc%2FdistributedTest%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2Fwan%2Fserial%2FSerialGatewaySenderDistributedDeadlockDUnitTest.java%23L176&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vnNkVWk0vTSjkAg1neUK91qe%2BwMyfoFyf9gabnT%2BKXs%3D&reserved=0
According to the comments in the test, it always causes a distributed deadlock, 
and it is commented out. Nevertheless, the test case is actually NOT commented 
out and, in fact, if you execute it, you see it passing without any 
failure/deadlock.

And here the questions:

Could it be that deadlocks with conserve-sockets=true and WAN and/or 
transactions over partitioned regions was some legacy issue that has already 
been fixed?

Otherwise, could someone please provide some more information about why these 
deadlocks could happen? It would be great if there were test cases that 
showcase this possibility.

It looks like a big limitation of Geode that you are forced to set 
conserve-sockets to false (with the implications this has on resources usage) 
when you are using WAN replication and/or transactions on partitioned regions.

Could it be that there are other elements (for example also using 
CacheListeners as Anthony Baker pointed out) that would increase the risk of 
hitting a distributed deadlock?

Thanks in advance,

Alberto

[1]: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F114%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=S%2B9DPPcFSrxIlCHtPFB0QUUVwT3fTcvHPapoP6vd97U%3D&reserved=0

[2]: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F114%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FtF2LJ7T6yLn%2FL0ZRySokjK8%2BOSUvXTV1BiFtNA2cpI%3D&reserved=0



⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


On conserve-sockets=true with WAN and/or transactions - Follow-up on April's Geode Community Meeting

2022-04-08 Thread Alberto Gomez
Hi,

Following up on the discussion we had yesterday in the Apache Geode Community 
meeting around the "Reflections on conserve-sockets setting in Apache Geode" 
topic, I'd like to post here some questions that could not be fully answered 
during the meeting:

The Geode documentation states the following about conserve-sockets and WAN 
deployments in [1]:
"WAN deployments increase the messaging demands on a Geode system. To avoid 
hangs related to WAN messaging, always set `conserve-sockets=false` for Geode 
members that participate in a WAN deployment."

It also states the following about conserve-sockets and transactions in [2]:
"When you have transactions operating on EMPTY, NORMAL or PARTITION regions, 
make sure that conserve-sockets is set to false to avoid distributed deadlocks."

Doing a search on the Geode tests, the only test case related to deadlocks with 
conserve-sockets=true that I have found is:
https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialGatewaySenderDistributedDeadlockDUnitTest.java#L176
According to the comments in the test, it always causes a distributed deadlock, 
and it is commented out. Nevertheless, the test case is actually NOT commented 
out and, in fact, if you execute it, you see it passing without any 
failure/deadlock.

And here the questions:

Could it be that deadlocks with conserve-sockets=true and WAN and/or 
transactions over partitioned regions was some legacy issue that has already 
been fixed?

Otherwise, could someone please provide some more information about why these 
deadlocks could happen? It would be great if there were test cases that 
showcase this possibility.

It looks like a big limitation of Geode that you are forced to set 
conserve-sockets to false (with the implications this has on resources usage) 
when you are using WAN replication and/or transactions on partitioned regions.

Could it be that there are other elements (for example also using 
CacheListeners as Anthony Baker pointed out) that would increase the risk of 
hitting a distributed deadlock?

Thanks in advance,

Alberto

[1]: 
https://geode.apache.org/docs/guide/114/managing/monitor_tune/sockets_and_gateways.html

[2]: 
https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_controls_controlling_socket_use.html


Re: April 2022 Community Meeting

2022-04-06 Thread Alberto Gomez
Hi all,

This is a reminder for tomorrow's (April 7th at 8:00 Pacific / 15:00 UTC / 
17:00 CEST) Apache Geode Community meeting.

Hope to see you there,

Alberto


From: Alberto Gomez
Sent: Monday, April 4, 2022 1:05 PM
To: dev@geode.apache.org 
Subject: April 2022 Community Meeting

Hi devs,

Next Thursday, April 7th, is our next Community Meeting.

We plan to present on the next topic: "Reflections on the conserve-sockets 
setting in Geode".

The meeting will take place, as usual, at 8:00 Pacific (15:00 UTC, 17:00 CEST).

Please, find the meeting details on the wiki: 
https://cwiki.apache.org/confluence/display/GEODE/Apache+Geode+Community+Meeting+Notes.

Looking forward to seeing you all!

Alberto


[PROPOSAL] RFC Interruption of threads stuck for a long time in Geode

2022-04-06 Thread Alberto Gomez
Hi,

I'd appreciate your feedback on this newly published RFC about "Interruption of 
threads stuck for a long time in Geode":

https://cwiki.apache.org/confluence/display/GEODE/Interruption+of+threads+stuck+for+a+long+time+in+Geode

Thanks in advance,

Alberto


Re: [PROPOSAL] Remove warning logs from FunctionException

2022-04-05 Thread Alberto Gomez
Thanks for your proposal, Juan.

I still think that it makes sense to remove these warning logs altogether. Even 
if the stack trace is removed, the amount of logs could still be huge if the 
operations received is high and the percentage of exceptions significant.

One more factor to add to this discussion is that these warning logs are only 
generated if the function is HA. If the function returns false to isHA() then 
the log does not appear.

I would say this is one more reason in favor of removing these logs so that the 
system is consistent.

@Barrett Oglesby<mailto:bogle...@vmware.com> are you still in favor of keeping 
these warning logs?

More opinions on this topic are very welcome in order to be able to decide.

Thanks,

Alberto

From: Ju@N 
Sent: Wednesday, March 30, 2022 7:04 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Remove warning logs from FunctionException

Hello all,

What about something in the middle?: log a WARNING level message stating
that the Function named XXX failed and also log the details (including the
stack trace) using DEBUG log level?. This would reduce the amount of logs
for functions that fail frequently, and will also allow the person
troubleshooting/debugging issues to easily tell that something is wrong
with function XXX.
Cheers.



On Wed, 30 Mar 2022 at 17:52, Jacob Barrett  wrote:

>
>
> On Mar 30, 2022, at 9:45 AM, Alberto Gomez  alberto.go...@est.tech>> wrote:
>
> The idea would not be to remove the logs but rather to change the level of
> these logs from warning to debug level.
>
> I agree, if any exception is expected as part user provided action it
> should not produce verbose logging.
>
> -Jake
>
>

--
Ju@N


April 2022 Community Meeting

2022-04-04 Thread Alberto Gomez
Hi devs,

Next Thursday, April 7th, is our next Community Meeting.

We plan to present on the next topic: "Reflections on the conserve-sockets 
setting in Geode".

The meeting will take place, as usual, at 8:00 Pacific (15:00 UTC, 17:00 CEST).

Please, find the meeting details on the wiki: 
https://cwiki.apache.org/confluence/display/GEODE/Apache+Geode+Community+Meeting+Notes.

Looking forward to seeing you all!

Alberto


Re: [PROPOSAL] Remove warning logs from FunctionException

2022-03-30 Thread Alberto Gomez
Thanks for your answer, Barry.

The idea would not be to remove the logs but rather to change the level of 
these logs from warning to debug level.

Given that according to the Geode documentation, FunctionExceptions are part of 
the "contract" between the execute methods and Geode, I think that Geode 
getting a FunctionException from an execute method is nothing out of the 
ordinary, so there is no reason for Geode to log it with a warning message; 
Geode should just "transmit the exception back to the caller as if it had been 
thrown on the calling side" (I'm quoting from the Geode docs).

I agree that for debugging purposes it is great to have as much information as 
possible. But the amount of information comes at a cost and that's the reason 
why normally in production debug logs are not activated.

Alberto


From: Barry Oglesby 
Sent: Wednesday, March 30, 2022 6:31 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Remove warning logs from FunctionException

I guess I would vote for not removing any information from a server log file 
that might be useful for debugging purposes. That would include exceptions 
occurring functions.
________
From: Alberto Gomez 
Sent: Wednesday, March 30, 2022 4:35 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Remove warning logs from FunctionException

⚠ External Email

Hi all,

I have not received any feedback on this proposal so far.

Does anybody have anything against it? Otherwise, I would like to proceed with 
the implementation of it.

Thanks,

Alberto
________
From: Alberto Gomez 
Sent: Thursday, March 24, 2022 4:29 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL] Remove warning logs from FunctionException

Hi,

Regarding how to implement a Function in Apache Geode and coding the execute 
method, the following is stated in [1]:

"To propagate an error condition or exception back to the caller of the 
function, throw a FunctionException from the execute method. Geode transmits 
the exception back to the caller as if it had been thrown on the calling side. 
See the Java API documentation for 
FunctionException<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fexecute%2FFunctionException.html&data=04%7C01%7Cboglesby%40vmware.com%7C80f0e02fe7e5412f910508da12416e95%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637842369486109461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tmj3jxYHzDUFCfK07I%2BiL9u2Ie4dluvlp4m%2FsCulnZk%3D&reserved=0>
 for more information."

And as per [2]:
"if a GemFire client executes a Function on a server, and the function's 
execute method throws a FunctionException, the server logs the exception as a 
warning, and transmits it back to the calling client, which throws it"

So, for every FunctionException thrown by a user Server Function, the server 
logs the exception with the corresponding stack trace.

This could imply, depending on the logic implemented in the user Server 
Function, that the log of the server is packed with these messages (which 
probably are not providing extra useful information given that the exception 
will reach the client), and making it difficult to analyze the logs to find 
other relevant events.

Given that Apache Geode recommends the use of FunctionException as the means to 
propagate an error condition or exception back to the caller, we could argue if 
a FunctionException thrown by a user Function should have any reflection, at 
all, in the server logs. Besides, as said before, depending on the amount of 
the exceptions generated, this can complicate the analysis of log files, 
require much more space for logs storage and have a negative impact in 
performance.

For the above reasons, I would like to propose to change the level of these 
messages to debug level. A configuration parameter to enable this possibility 
could be provided for backward compatibility.

Please, feel free to comment on this proposal.

Thanks,

Alberto


[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F114%2Fdeveloping%2Ffunction_exec%2Ffunction_execution.html&data=04%7C01%7Cboglesby%40vmware.com%7C80f0e02fe7e5412f910508da12416e95%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637842369486109461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=S4uw77glRhcgSGereSnxbrxPoyEQaJXNeKmHCc8D%2BFw%3D&reserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Flatest%2Fjavadoc%2Forg%2Fapache%2Fgeode%2Fcache%2Fexecute%2FFunctionException.html&data=04%7C01%7Cboglesby%40vmware.com%7C80f0e02fe7e5412f9

Re: [PROPOSAL] Remove warning logs from FunctionException

2022-03-30 Thread Alberto Gomez
Hi all,

I have not received any feedback on this proposal so far.

Does anybody have anything against it? Otherwise, I would like to proceed with 
the implementation of it.

Thanks,

Alberto

From: Alberto Gomez 
Sent: Thursday, March 24, 2022 4:29 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL] Remove warning logs from FunctionException

Hi,

Regarding how to implement a Function in Apache Geode and coding the execute 
method, the following is stated in [1]:

"To propagate an error condition or exception back to the caller of the 
function, throw a FunctionException from the execute method. Geode transmits 
the exception back to the caller as if it had been thrown on the calling side. 
See the Java API documentation for 
FunctionException<https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/execute/FunctionException.html>
 for more information."

And as per [2]:
"if a GemFire client executes a Function on a server, and the function's 
execute method throws a FunctionException, the server logs the exception as a 
warning, and transmits it back to the calling client, which throws it"

So, for every FunctionException thrown by a user Server Function, the server 
logs the exception with the corresponding stack trace.

This could imply, depending on the logic implemented in the user Server 
Function, that the log of the server is packed with these messages (which 
probably are not providing extra useful information given that the exception 
will reach the client), and making it difficult to analyze the logs to find 
other relevant events.

Given that Apache Geode recommends the use of FunctionException as the means to 
propagate an error condition or exception back to the caller, we could argue if 
a FunctionException thrown by a user Function should have any reflection, at 
all, in the server logs. Besides, as said before, depending on the amount of 
the exceptions generated, this can complicate the analysis of log files, 
require much more space for logs storage and have a negative impact in 
performance.

For the above reasons, I would like to propose to change the level of these 
messages to debug level. A configuration parameter to enable this possibility 
could be provided for backward compatibility.

Please, feel free to comment on this proposal.

Thanks,

Alberto


[1] 
https://geode.apache.org/docs/guide/114/developing/function_exec/function_execution.html
[2] 
https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/execute/FunctionException.html


[PROPOSAL] Remove warning logs from FunctionException

2022-03-24 Thread Alberto Gomez
Hi,

Regarding how to implement a Function in Apache Geode and coding the execute 
method, the following is stated in [1]:

"To propagate an error condition or exception back to the caller of the 
function, throw a FunctionException from the execute method. Geode transmits 
the exception back to the caller as if it had been thrown on the calling side. 
See the Java API documentation for 
FunctionException
 for more information."

And as per [2]:
"if a GemFire client executes a Function on a server, and the function's 
execute method throws a FunctionException, the server logs the exception as a 
warning, and transmits it back to the calling client, which throws it"

So, for every FunctionException thrown by a user Server Function, the server 
logs the exception with the corresponding stack trace.

This could imply, depending on the logic implemented in the user Server 
Function, that the log of the server is packed with these messages (which 
probably are not providing extra useful information given that the exception 
will reach the client), and making it difficult to analyze the logs to find 
other relevant events.

Given that Apache Geode recommends the use of FunctionException as the means to 
propagate an error condition or exception back to the caller, we could argue if 
a FunctionException thrown by a user Function should have any reflection, at 
all, in the server logs. Besides, as said before, depending on the amount of 
the exceptions generated, this can complicate the analysis of log files, 
require much more space for logs storage and have a negative impact in 
performance.

For the above reasons, I would like to propose to change the level of these 
messages to debug level. A configuration parameter to enable this possibility 
could be provided for backward compatibility.

Please, feel free to comment on this proposal.

Thanks,

Alberto


[1] 
https://geode.apache.org/docs/guide/114/developing/function_exec/function_execution.html
[2] 
https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/execute/FunctionException.html


Question about crossing NUMA boundary

2022-02-28 Thread Alberto Gomez
Hi,

We understand the recommendation is to fit the Geode JVM within one NUMA node 
for optimal performance, so in case we're running in a system with multiple 
NUMA nodes and our JVM can fit in the memory available in a single NUMA, it is 
recommended to pin it there ([1]).

However, does anyone have any numbers to compare the performance of the 
same-sized Geode JVM when run on non-NUMA hardware vs run on NUMA hardware 
where JVM is spread on more NUMA nodes?

Have you played with newer JDKs and GCs that have better NUMA awareness to 
quantify if the drop in performance could be reduced to acceptable levels?

Thanks!

Alberto

[1]: 
https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_on_vsphere.html


Re: New RFC about Enhancements in Off-heap memory fragmentation visibility

2022-02-22 Thread Alberto Gomez
Hi all,

Just a reminder that the deadline for comments for this RFC is tomorrow, 
February 23rd.

If there's still anyone that would like to review the proposal and needs more 
time, please let me know.

Otherwise, if I do not find comments against it, I would like to proceed with 
the implementation of the proposal after the deadline is met.

Thanks,

Alberto

From: Alberto Gomez 
Sent: Monday, February 7, 2022 6:48 PM
To: dev@geode.apache.org 
Subject: New RFC about Enhancements in Off-heap memory fragmentation visibility

Hi fellow devs,

Here is an RFC on "Enhance Off-heap memory fragmentation visibility":

https://cwiki.apache.org/confluence/display/GEODE/Enhance+Off-heap+memory+fragmentation+visibility

Your review and comments are very welcome!

Alberto



Next Apache Community Meeting: February 17th

2022-02-16 Thread Alberto Gomez
Hi devs,

This is a reminder about tomorrow's (February 17th at 8:00 PST / 16:00 UTC) 
next community meeting in which we plan to present and discuss the following 
topic:

"Thread and server health monitoring: how to kick out slow/sick members and the 
like"

Details here:
https://cwiki.apache.org/confluence/display/GEODE/Apache+Geode+Community+Meeting+Notes

See you,

Alberto


Next Geode community meeting: 17th of February

2022-02-08 Thread Alberto Gomez
Hi all,

We'd like to propose to have our next Geode community meeting on February 17th 
to present and discuss the following topic:

"Thread and server health monitoring: how to kick out slow/sick members and the 
like"

Details here:
https://cwiki.apache.org/confluence/display/GEODE/Apache+Geode+Community+Meeting+Notes

Please, let me know if there is any inconvenience with the date proposed. 
Otherwise, see you there!

Alberto


New RFC about Enhancements in Off-heap memory fragmentation visibility

2022-02-07 Thread Alberto Gomez
Hi fellow devs,

Here is an RFC on "Enhance Off-heap memory fragmentation visibility":

https://cwiki.apache.org/confluence/display/GEODE/Enhance+Off-heap+memory+fragmentation+visibility

Your review and comments are very welcome!

Alberto



Re: API check error when adding a new method to a public interface

2021-11-23 Thread Alberto Gomez
Thanks, Anil.

I will do as you suggest.

Alberto

From: Anilkumar Gingade 
Sent: Tuesday, November 23, 2021 7:33 PM
To: dev@geode.apache.org 
Subject: Re: API check error when adding a new method to a public interface

Alberto,

I don’t think the intention is to avoid, discourage adding a new method...As 
you have seen any changes to the API or adding a new API has implications on 
other parts of the product, it is good to validate/verify and address the 
dependency across the product and get everything working in accordance (without 
breaking any compatibility). If you have any requirement please propose through 
RFC and get an approval.

-Anil.

On 11/23/21, 8:44 AM, "Alberto Gomez"  wrote:

Hi,

After the introduction of GEODE-9702 
(https://issues.apache.org/jira/browse/GEODE-9702), adding a new method to a 
public interface will make the api-check-test-openjdk11 fail even if a default 
implementation is provided.

My question is if the goal of this change is to forbid this type of changes 
in minor versions or if there is a process to follow in order for changes of 
this type to be added.

I wanted to propose (in an RFC) the addition of a new parameter to the 
create gateway sender command that would require adding a new method to the 
GatewaySender interface as well as to other public interfaces and I was 
wondering if this will be possible at all, and if so, how should I proceed with 
it.

Thanks,

Alberto




API check error when adding a new method to a public interface

2021-11-23 Thread Alberto Gomez
Hi,

After the introduction of GEODE-9702 
(https://issues.apache.org/jira/browse/GEODE-9702), adding a new method to a 
public interface will make the api-check-test-openjdk11 fail even if a default 
implementation is provided.

My question is if the goal of this change is to forbid this type of changes in 
minor versions or if there is a process to follow in order for changes of this 
type to be added.

I wanted to propose (in an RFC) the addition of a new parameter to the create 
gateway sender command that would require adding a new method to the 
GatewaySender interface as well as to other public interfaces and I was 
wondering if this will be possible at all, and if so, how should I proceed with 
it.

Thanks,

Alberto



"missing image artifact source: alpine-tools-image" when running CI

2021-11-22 Thread Alberto Gomez
Hi,

I am getting the following error when running CI jobs after pushing changes in 
a PR:

"missing image artifact source: alpine-tools-image"

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/build/builds/163

Anyone else experiencing the same?

Alberto


Re: Test failures on Windows with insufficient memory for the JRE while running distributed tests

2021-10-27 Thread Alberto Gomez
Thanks, Kirk.

Any expert on the OS images and pipeline could jump in to answer Kirk's 
questions and help?

Thanks,

Alberto

From: Kirk Lund 
Sent: Tuesday, October 26, 2021 7:26 PM
To: dev@geode.apache.org 
Subject: Re: Test failures on Windows with insufficient memory for the JRE 
while running distributed tests

PS: I should also mention that the *windows-gfsh-distributed* test target
is only run on Windows (never on Linux). It might be useful to try getting
windows-gfsh-distributed running on LInux to see if it hits the same issue
on that OS. This would also require some help from a pipeline expert.

On Tue, Oct 26, 2021 at 10:22 AM Kirk Lund  wrote:

> Hi Alberto,
>
> 32 kb is a very small amount of memory, so I don't think it's related to
> Java Heap. Based on what little I've read today, I think a failure in
> ChunkPool::allocate is probably related to either *running out of swap
> space or running out of address space in a 32 bit JVM*. Since the
> failures are OS specific, I would suspect the machine image we use for
> Windows to be involved.
>
> I also notice that this ChunkPool::allocate failure is only occurring for
> the Gfsh distributed tests which is the only job run on Windows that uses
> Gradle support for *JUnit Categories*. The Gradle target is
> distributedTest which we have configured with "*forkEvery 1*" which
> causes every test class to launch in a new JVM. Gradle implements JUnit
> 4 Category filtering by launching every test class to check the Categories
> and then either executes the tests or terminates without running any
> depending on the Categories.
>
> Some things I would check (or ask others about):
>
> *Is the harddrive space much smaller than what's available to the JVM(s)
> on Linux?*
>
> *Do the Gfsh distributed tests on Windows leave behind more artifacts on
> the harddrive than other test targets?*
>
> *Is it possible that the tests are using a 32-bit JVM on Windows? Or maybe
> the tests are spawning Gfsh process(es) using a 32-bit JVM instead of
> 64-bit?*
>
> *Are we running the Gfsh distributed tests in parallel (which might
> exacerbate harddrive swapping or memory consumption)?*
>
> Unfortunately, I don't know what most of the options in
> jinja.variables.yml are about. I think it would be best to get help from an
> expert in the OS images and pipeline details.
>
> Cheers,
> Kirk
>
> On Tue, Oct 26, 2021 at 12:59 AM Alberto Gomez 
> wrote:
>
>> Hi,
>>
>> I am having issues with insufficient memory for the Java Runtime
>> Environment when running some tests on the CI under Windows from the
>> following PR :
>> https://github.com/apache/geode/pull/7006
>>
>> The tests never fail under Linux.
>>
>> This is the error I get for some VMs:
>>
>> [vm4] # There is insufficient memory for the Java Runtime Environment to
>> continue.
>> [vm4] # Native memory allocation (malloc) failed to allocate 32744 bytes
>> for ChunkPool::allocate
>>
>> I have reduced the amount of resources used originally by the tests but
>> still I am not able to get a clean execution.
>>
>> I do not know if it is a matter of changing the parameters for the
>> windows execution in ci/pipelines/shared/jinja.variables.yml or if there is
>> anything else to consider.
>>
>> I would appreciate if someone from the community could help me
>> troubleshoot this issue.
>>
>> Thanks in advance,
>>
>> Alberto
>>
>>
>>


Test failures on Windows with insufficient memory for the JRE while running distributed tests

2021-10-26 Thread Alberto Gomez
Hi,

I am having issues with insufficient memory for the Java Runtime Environment 
when running some tests on the CI under Windows from the following PR :
https://github.com/apache/geode/pull/7006

The tests never fail under Linux.

This is the error I get for some VMs:

[vm4] # There is insufficient memory for the Java Runtime Environment to 
continue.
[vm4] # Native memory allocation (malloc) failed to allocate 32744 bytes for 
ChunkPool::allocate

I have reduced the amount of resources used originally by the tests but still I 
am not able to get a clean execution.

I do not know if it is a matter of changing the parameters for the windows 
execution in ci/pipelines/shared/jinja.variables.yml or if there is anything 
else to consider.

I would appreciate if someone from the community could help me troubleshoot 
this issue.

Thanks in advance,

Alberto




Re: October Community Meeting

2021-10-04 Thread Alberto Gomez
Yes, no problem for us to postpone a bit the query topic.

Thanks,

Alberto

From: Alexander Murmann 
Sent: Tuesday, October 5, 2021 1:54 AM
To: dev@geode.apache.org ; u...@geode.apache.org 

Subject: Re: October Community Meeting

Both are such great topics! I think either of them will easily fill up the hour.

Is it OK to keep the query topic for November? We could always set up a 
separate call for anyone who is interested.


--
Please provide anonymous feedback for me 
here<https://forms.gle/kpUiaty3jt8X9y4H9>.

From: Alberto Gomez 
Sent: Friday, October 1, 2021 00:09
To: dev@geode.apache.org ; u...@geode.apache.org 

Subject: Re: October Community Meeting

Hi,

We would like to touch on the topic Anthony mentioned. We could present the 
problem and the different approaches we have explored so far and then continue 
with the discussion started in the RFC.

It could be on next week's meeting or in a later meeting.

Alberto

From: Anthony Baker 
Sent: Wednesday, September 29, 2021 11:39 PM
To: dev@geode.apache.org 
Cc: u...@geode.apache.org 
Subject: Re: October Community Meeting

Looking forward to it!  There has been some good discussion on query resource 
management [1] as well, perhaps we can pick up that topic another time if there 
sufficient interest.

Anthony

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FThrottling%2Bof%2BOQL%2Bqueries%3FfocusedCommentId%3D18874%23comment-18874&data=04%7C01%7Camurmann%40vmware.com%7C706f093fac6d4ede240c08d984aa6407%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637686689623689574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FKToH0m75SbWG%2BWHFVXPEknTMBGuO%2BT7jISatzhBgjQ%3D&reserved=0


On Sep 29, 2021, at 9:09 AM, Alexander Murmann 
mailto:amurm...@vmware.com>> wrote:

Hi everyone!

Next Thursday October 6th is our next Community Meeting. Jacob Barrett has 
started some discussions on the mailing list about modularization. This is a 
great topic for our upcoming meeting. Jake won't be able to join till one hour 
after our regular time. I hope everyone is fine if we have the October meeting 
an hour later at 9:00 Pacific (16:00 UTC).

Find the meeting 
details<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FApache%2BGeode%2BCommunity%2BMeeting%2BNotes&data=04%7C01%7Camurmann%40vmware.com%7C706f093fac6d4ede240c08d984aa6407%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637686689623689574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0181kjGbTrcu7m4E86RVKdcM%2F8oO%2F0K4MErHrUxj8qk%3D&reserved=0>
 on the wiki.

Looking forward to seeing you all next week!



Re: October Community Meeting

2021-10-01 Thread Alberto Gomez
Hi,

We would like to touch on the topic Anthony mentioned. We could present the 
problem and the different approaches we have explored so far and then continue 
with the discussion started in the RFC.

It could be on next week's meeting or in a later meeting.

Alberto

From: Anthony Baker 
Sent: Wednesday, September 29, 2021 11:39 PM
To: dev@geode.apache.org 
Cc: u...@geode.apache.org 
Subject: Re: October Community Meeting

Looking forward to it!  There has been some good discussion on query resource 
management [1] as well, perhaps we can pick up that topic another time if there 
sufficient interest.

Anthony

[1] 
https://cwiki.apache.org/confluence/display/GEODE/Throttling+of+OQL+queries?focusedCommentId=18874#comment-18874


On Sep 29, 2021, at 9:09 AM, Alexander Murmann 
mailto:amurm...@vmware.com>> wrote:

Hi everyone!

Next Thursday October 6th is our next Community Meeting. Jacob Barrett has 
started some discussions on the mailing list about modularization. This is a 
great topic for our upcoming meeting. Jake won't be able to join till one hour 
after our regular time. I hope everyone is fine if we have the October meeting 
an hour later at 9:00 Pacific (16:00 UTC).

Find the meeting 
details
 on the wiki.

Looking forward to seeing you all next week!



Re: PROPOSAL: Remove WAN TX Batching Serialization Changes

2021-09-22 Thread Alberto Gomez
No worries, Jake.

The steps you are proposing sound good to me. Let me know if I can be of any 
help.

Alberto

From: Jacob Barrett 
Sent: Wednesday, September 22, 2021 7:44 PM
To: dev@geode.apache.org 
Subject: Re: PROPOSAL: Remove WAN TX Batching Serialization Changes



> On Sep 22, 2021, at 12:31 AM, Alberto Gomez  wrote:
>
> Hi,
>
> Jake, why do you say the feature is not complete in 1.14.0? In my view, it 
> works as it was designed and as documented.

Sorry, I was misinformed and misjudge the state. I meant no disrespect.

> I do not think it makes sense to remove a feature shipped in the 1.14.0 
> release that customers might already be using.

I agree!

> If the urgent issue to solve is the unnecessary serialization overhead when 
> the WAN or AEQ events are part of a transaction and the sender has not 
> enabled TX batching, I propose we try to fix just this in 1.14.1.

Yes, let me look at an alternative approach for 1.14 that can reduce that 
overhead when the sender is not batching transactions.

I think it makes send to scratch the original proposal. I will just write up a 
JIRA and PR for the protocol changes agains develop and then we can move them 
back to 1.14.1.

-Jake



Re: PROPOSAL: Remove WAN TX Batching Serialization Changes

2021-09-22 Thread Alberto Gomez
Hi,

Jake, why do you say the feature is not complete in 1.14.0? In my view, it 
works as it was designed and as documented.

I do not think it makes sense to remove a feature shipped in the 1.14.0 release 
that customers might already be using.

If the urgent issue to solve is the unnecessary serialization overhead when the 
WAN or AEQ events are part of a transaction and the sender has not enabled TX 
batching, I propose we try to fix just this in 1.14.1.

Alberto


From: Jacob Barrett 
Sent: Tuesday, September 21, 2021 11:13 PM
To: dev@geode.apache.org 
Subject: PROPOSAL: Remove WAN TX Batching Serialization Changes

Devs,

In addition to my discussion regarding the modularization of the WAN TX 
batching implementation I would like to propose that we remove the 
serialization changes that went into 1.14 to support it. Since the feature is 
not complete in 1.14 this should only impact the associated tests in 1.14. I 
want to do this to eliminate the necessary serialization of the of the 
transaction ID and last event flags as well as the boolean to determine if 
there is a transaction ID. As implemented right now this data is serialized for 
both WAN and AEQ sender events that are part of a transaction regardless of the 
enablement of TX batching on the sender. The transaction ID contains both the 4 
byte counter and large membership ID.
https://github.com/apache/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/wan/GatewaySenderEventImpl.java#L712

Since this went out in 1.14.0 the removal would be treated like any other 
upgrade to the protocol and a 1.14.1 version would not read or write any of 
those bites. When talking to exactly a 1.14.0 version the implementation would 
write only the false flag and read the flag and ignore the rest as necessary. 
The tests related to TX batching would also need to be disabled.

Something like this:

  public void toData(DataOutput out,
  SerializationContext context) throws IOException {
// intentionally skip 1.14.0
toDataPre_GEODE_1_14_0_0(out, context);
  }

  public void toDataPre_GEODE_1_14_1_0(DataOutput out,
  SerializationContext context) throws IOException {
toDataPre_GEODE_1_14_0_0(out, context);
DataSerializer.writeBoolean(false);
  }

  public void fromData(DataInput in, DeserializationContext context)
  throws IOException, ClassNotFoundException {
fromDataPre_GEODE_1_14_1_0(in, context);
  }

  public void fromDataPre_GEODE_1_14_1_0(DataInput in, DeserializationContext 
context)
  throws IOException, ClassNotFoundException {
fromDataPre_GEODE_1_14_0_0(in, context);
if (version == KnownVersion.GEODE_1_14_0.ordinal()) {
  if (hasTransaction) {
DataSerializer.readBoolean(DataSerializer.readBoolean(in));
context.getDeserializer().readObject(in);
  }
}
  }

I would also propose that if 1.15.0 looks like it will ship without the 
modularization changes that we at least address the serialization changes here 
in a way that does not affect all gateways, WAN or AEQ.

If accepted I will write up two JIRAs, one to address the 1.14 removal and the 
other as a blocker on 1.15 to address the serialization issues.

Ok, chime in!

-Jake



Re: [DISCUSS] Modularizing new WAN TX Batching (and Modularization Efforts in General)

2021-09-21 Thread Alberto Gomez
I think this is a great initiative. Not only are you giving a reminder about 
the need to implement new features in a modular and pluggable way, but you are 
also providing a real example with an already implemented not in the best way 
feature, to show the steps to follow in the right direction for this and future 
features.

I would be interested in attending to a live walkthrough over the details of 
the changes.

Thanks,

Alberto

From: Jacob Barrett 
Sent: Tuesday, September 21, 2021 3:04 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Modularizing new WAN TX Batching (and Modularization 
Efforts in General)

Unfortunately I can’t do meetings that early. The earliest I could do that day 
is 9am PDT.

> On Sep 20, 2021, at 3:34 PM, Anthony Baker  wrote:
>
> The approach makes sense to me. The idea of a stable core + extensibility is 
> a common and successful pattern in many OSS projects.
>
> @Jake, you want to discuss at the next Geode Community meeting in Oct?
>
> Anthony
>
>
>> On Sep 20, 2021, at 1:48 PM, Jacob Barrett  wrote:
>>
>> Devs,
>>
>> We need to be doing a better job with implementing new features in a modular 
>> and plugable way. We have had discussions in the past but we haven’t been 
>> the best at sticking to it. Most recently we began work on a modified 
>> version of WAN that supports transactional batching. Rather than implement 
>> it in a plugable model we modified the existing implementation with a new 
>> property that changes the internal behavior of the implementation. This is a 
>> clear smell that what we have should be plugable and modularized. I am not 
>> suggesting that we run out and define clear public SPIs for everything or 
>> come up with some complicated plan to re-architect everything into modules 
>> but rather that we take small steps. I will argue that when we are adding 
>> functionality to a core service that is the point we should consider steps 
>> to break it up into clear module components. Think to yourself, what would 
>> it take for me to implement this new functionality as its own module, 
>> meaning its own jar and Gradle sub-project. As you begin to develop the 
>> solution you may find you need some clean interfaces for it to extend from 
>> the core, that might be the start of an internal SPI. You may find that some 
>> concrete classes you would have normally modified just need to be extended 
>> with a few protected methods to implement alternative logic.
>>
>> I think we should focus efforts on extracting an interface to plug in 
>> different WAN gateway implementations so that existing implementations 
>> aren’t modified with new behavior. With proper interface extraction we can 
>> more easily unit test around WAN gateways. By keeping implementations small 
>> we can more easily test them in isolation. Making them all plugable allows 
>> distributions of Geode to deliver specific implementations they would like 
>> to support without impacting the existing implementations. It also frees 
>> Geode to release new experimental or beta implementations of WAN gateways 
>> without impacting the existing implementations rather than delaying releases 
>> waiting for modified WAN gateways to be production ready and fully tested.
>>
>> In looking at the geode-wan module one might notice that it was already 
>> designed to be plugable. Unfortunately it isn’t that easy. This SPI was 
>> originally there to provide a way for Geode to be donated initially without 
>> WAN support. It turns out that most of the core to WAN is actually still in 
>> geode-core and only some of the “secret sauce” was moved into geode-wan. The 
>> bulk of the geode-core code for WAN is actually in support of the region 
>> queues for WAN and AEQ, so moving it into geod-wan would have cut off AEQ. 
>> While it would be possible to utilize this SPI for providing alternative 
>> gateway implementations it is very high level,so much of the alternative 
>> implementations would be duplications, and it doesn’t allow for both 
>> implementations to sit side by side at runtime. I would actually suggest we 
>> eliminate this public SPI in favor of just the geode-wan core module that it 
>> is and eventually migrate the region queue code into its own module as well, 
>> but these are for another day.
>>
>> Looking closer at the WAN gateways themselves there is mostly a pluggable 
>> interface already there with the existing interfaces. I spent a little time 
>> pulling apart an internal SPI and it was quite easy. With a small 
>> modification to gfsh and cache xml to specify that alternative 
>> implementation by name is all that needs to be done immediately to configure 
>> an alternative. Without extracting too many of the common implementation out 
>> into its own module just a few of the classes in geode-core can be modified 
>> to provide empty implementation of key overridable plug-in points for the TX 
>> batching implementation. The result is

[DISCUSS] RFC: OQL Queries Throttling

2021-09-02 Thread Alberto Gomez
Hi there,

Here you have an RFC for "OQL Queries Throttling":

https://cwiki.apache.org/confluence/display/GEODE/Throttling+of+OQL+queries

Comments are very welcome. Thanks!

Alberto


Re: Questions about conserve-sockets and WAN replication

2021-08-27 Thread Alberto Gomez
Hi Dave,

I have created the following JIRA tickets:

https://issues.apache.org/jira/browse/GEODE-9557
https://issues.apache.org/jira/browse/GEODE-9558

Please feel free to comment about them.

Thanks,

Alberto

From: Dave Barnes 
Sent: Thursday, August 26, 2021 7:47 PM
To: dev@geode.apache.org 
Cc: Alberto Gomez 
Subject: Re: Questions about conserve-sockets and WAN replication

Alberto,
As you point out, the recommendation to use `conserve-sockets=false` in WAN
configurations already appears in at least three places in the Geode User
Guide.
We can insert an additional mention into the guide -- did you have a
location in mind (sorry, this is an old thread and I don't recall whether
we already discussed this).

With regard to function execution ignoring the global setting, I suspect
that changing function behavior would break existing applications, so that
is probably not an option here.
Again, if you help me identify locations in the doc where you think we need
to insert a note regarding this behavior, we can do that.

Let's state these changes in one or two JIRA tickets for the docs
component. I'm happy to work with you on creating that ticket, but need
your help in identifying target locations in the guide.
Thanks,
Dave


On Thu, Aug 26, 2021 at 2:28 AM Alberto Gomez 
wrote:

> @Dave Barnes<mailto:dav...@vmware.com>, sorry for not having answered to
> your e-mail before.
>
> I am missing the following in the referred documentation:
>
>   *   State that conserve-sockets must be set to false for members that
> participate in a WAN deployment as it is stated in other parts of the
> documentation (see
> ./geode-docs/topologies_and_comm/multi_site_configuration/setting_up_a_multisite_system.html.md.erb
> ./geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb
> ./geode-docs/reference/topics/gemfire_properties.html.md.erb):
>   *   "To avoid hangs related to WAN messaging, always use the default
> setting of conserve-sockets=false for
> <%=vars.product_name%> members that participate in a WAN deployment."
>
> @dev@geode.apache.org<mailto:dev@geode.apache.org>, besides the above, I
> think the documentation is missing a very important piece of information
> that I have found in [1]:
> "even with conserve-sockets set to false, function executions do not use
> this setting and defaults to conserve-sockets=true behavior, regardless of
> the conserve-sockets setting. "
> and in [2]:
> "a Function Execution Processor does not honor the conserve-sockets
> setting so a shared P2P message reader is used in the remote server"
>
> I wonder if this should be stated in the Geode documentation or rather if
> function execution behavior should be changed to honor the conserve-sockets
> setting.
> Any thoughts on this?
>
> Best regards,
>
> Alberto
>
> [1]
> https://community.pivotal.io/s/article/GemFire-Function-Executions-and-conserve-sockets-behavior?language=en_US
> [2]
> https://medium.com/swlh/threads-used-in-apache-geode-function-execution-9dd707cf227c#bd8c
>
>
>   *
>
> 
> From: Dave Barnes 
> Sent: Wednesday, July 7, 2021 1:05 AM
> To: dev@geode.apache.org 
> Subject: Re: Questions about conserve-sockets and WAN replication
>
> Alberto,
> I recently updated some of the descriptions regarding conserve-sockets.
> Please check out this PR and see if it addresses any of your concerns.
> https://github.com/apache/geode/pull/6516
>
> On Tue, Jul 6, 2021 at 9:57 AM Alberto Gomez 
> wrote:
>
> > Hi,
> >
> > The Geode documentation states the following about conserve-sockets and
> > WAN deployments in [1]:
> >
> > "WAN deployments increase the messaging demands on a Geode system. To
> > avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> > for Geode members that participate in a WAN deployment."
> >
> > Could anyone please provide some more detailed information about why and
> > where these hangs could happen? Is this a hard limitation or something to
> > be considered under certain circumstances?
> >
> > We have run into an unexpected situation which we wonder if it is related
> > to the documentation statement above:
> >
> > In a system like the following:
> >  - 2 WAN sites and 3 servers each
> >  - several partitioned regions with parallel senders
> >  - several replicated regions with serial senders
> >  - conserve-sockets set to true
> >
> > We have sometimes observed, when trying to stop a parallel gateway sender
> > while puts are being sent to both sites, that the thread stopping the
> > gateway sender in one of the members 

Re: Questions about conserve-sockets and WAN replication

2021-08-26 Thread Alberto Gomez
@Dave Barnes<mailto:dav...@vmware.com>, sorry for not having answered to your 
e-mail before.

I am missing the following in the referred documentation:

  *   State that conserve-sockets must be set to false for members that 
participate in a WAN deployment as it is stated in other parts of the 
documentation (see 
./geode-docs/topologies_and_comm/multi_site_configuration/setting_up_a_multisite_system.html.md.erb
 ./geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb 
./geode-docs/reference/topics/gemfire_properties.html.md.erb):
  *   "To avoid hangs related to WAN messaging, always use the default setting 
of conserve-sockets=false for 
<%=vars.product_name%> members that participate in a WAN deployment."

@dev@geode.apache.org<mailto:dev@geode.apache.org>, besides the above, I think 
the documentation is missing a very important piece of information that I have 
found in [1]:
"even with conserve-sockets set to false, function executions do not use this 
setting and defaults to conserve-sockets=true behavior, regardless of the 
conserve-sockets setting. "
and in [2]:
"a Function Execution Processor does not honor the conserve-sockets setting so 
a shared P2P message reader is used in the remote server"

I wonder if this should be stated in the Geode documentation or rather if 
function execution behavior should be changed to honor the conserve-sockets 
setting.
Any thoughts on this?

Best regards,

Alberto

[1] 
https://community.pivotal.io/s/article/GemFire-Function-Executions-and-conserve-sockets-behavior?language=en_US
[2] 
https://medium.com/swlh/threads-used-in-apache-geode-function-execution-9dd707cf227c#bd8c


  *


From: Dave Barnes 
Sent: Wednesday, July 7, 2021 1:05 AM
To: dev@geode.apache.org 
Subject: Re: Questions about conserve-sockets and WAN replication

Alberto,
I recently updated some of the descriptions regarding conserve-sockets.
Please check out this PR and see if it addresses any of your concerns.
https://github.com/apache/geode/pull/6516

On Tue, Jul 6, 2021 at 9:57 AM Alberto Gomez  wrote:

> Hi,
>
> The Geode documentation states the following about conserve-sockets and
> WAN deployments in [1]:
>
> "WAN deployments increase the messaging demands on a Geode system. To
> avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> for Geode members that participate in a WAN deployment."
>
> Could anyone please provide some more detailed information about why and
> where these hangs could happen? Is this a hard limitation or something to
> be considered under certain circumstances?
>
> We have run into an unexpected situation which we wonder if it is related
> to the documentation statement above:
>
> In a system like the following:
>  - 2 WAN sites and 3 servers each
>  - several partitioned regions with parallel senders
>  - several replicated regions with serial senders
>  - conserve-sockets set to true
>
> We have sometimes observed, when trying to stop a parallel gateway sender
> while puts are being sent to both sites, that the thread stopping the
> gateway sender in one of the members gets stuck waiting to receive a reply
> from the other members (trying to get the size of the queue, see [2]). We
> see also other threads stuck, some trying to get a lock held by the stuck
> thread and others waiting in
> ReplyProcessor21.waitForRepliesUninterruptibly() trying to put or get data
> remotely (See [3] and [4]).
> If we set conserve-sockets to false we do not experience any hang.
>
> Could these stuck threads be related to what is stated in the
> documentation about WAN deployments and conserve-sockets set to true or
> should we rather think that it is an unrelated bug that needs to be solved?
>
> Thanks in advance for your help,
>
> Alberto
>
> [1]
> https://geode.apache.org/docs/guide/113/managing/monitor_tune/sockets_and_gateways.html
>
> [2]
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread1" #1316
> daemon prio=10 os_prio=0 cpu=18.86ms elapsed=1544.80s
> tid=0x7f92bc1c2000 nid=0x2154 waiting on condition  [0x7f9179cd2000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
> - parking to wait for  <0x00031ca2be50> (a
> java.util.concurrent.CountDownLatch$Sync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11
> /LockSupport.java:234)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11
> /AbstractQueuedSynchronizer.java:1079)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11
> /AbstractQueuedSyn

Re: [INFO] Apache Geode 1.14.0 Release Manager

2021-08-24 Thread Alberto Gomez
Hi Naba,

Is there any new information about the 1.14 Geode release?

Thanks in advance,

Alberto

From: Nabarun Nag 
Sent: Tuesday, June 1, 2021 6:46 PM
To: dev@geode.apache.org 
Subject: Re: [INFO] Apache Geode 1.14.0 Release Manager

Hi Alberto,

For releasing 1.14 we are waiting on two more backports.

GEODE-8609 - DUnit runners were not checking the logs for suspicious 
errors/fatal messages in VMs that were being restarted in a test. A fix is 
ready and is in PR review stage[1]

GEODE-9289 - A problem with Cluster Configuration where a new locator sends its 
configuration to an old locator, the old locator is unable to deserialize it, 
causes NPEs and clears out certain fields in the configuration. A PR is ready 
and is in review phase but this will need GEODE-8609 checked in first.[2]

Once these two PRs are merged and backported, we are going to start the process 
for voting on the release branch and release candidates. My personal estimate 
is that we can start with the voting process within next week, if we do not 
detect any serious issues.

Please do reach out if you require any additional information.

Regards,
Nabarun

[1] https://github.com/apache/geode/pull/6526
[2] https://github.com/apache/geode/pull/6495

From: Alberto Gomez 
Sent: Monday, May 31, 2021 8:08 AM
To: dev@geode.apache.org 
Subject: Re: [INFO] Apache Geode 1.14.0 Release Manager

Hi Naba,

Can you please provide some information about how the 1.14 release is going and 
if is there any planned date for it?

Thanks in advance,

Alberto

From: Nabarun Nag 
Sent: Monday, March 22, 2021 5:27 PM
To: dev@geode.apache.org 
Subject: [INFO] Apache Geode 1.14.0 Release Manager

Hi everyone,

I hope you all are doing well. This is to inform the Apache Geode community 
that I will be volunteering as the Release Manager for 1.14.0 release. Thank 
you, Owen, for all the work that has been done to get the release to this point.

As for backporting, as a developer, you just need to create a PR against the 
support/1.14 branch, and you are done. As a release manager, I will take over 
from there.

Just ensure the following:

  *   The PR is a cherry-pick (cherry-pick -x) of a commit that is already in 
develop
  *   Ensure that there are no merge conflicts.

Regards
Nabarun Nag



Re: Pending review from some code owners for PR linked to GEODE-9369: Command to copy region entries from a WAN site to another

2021-08-23 Thread Alberto Gomez
Hi,

Friendly reminder about pending reviews of some code owners for this PR.

Thanks,

Alberto

From: Alberto Gomez
Sent: Wednesday, July 28, 2021 1:52 PM
To: dev@geode.apache.org 
Subject: Pending review from some code owners for PR linked to GEODE-9369: 
Command to copy region entries from a WAN site to another

Hi,

The following PR https://github.com/apache/geode/pull/6601 has received the 
approval from several code owners but there are still some code owners' reviews 
pending.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,

Alberto


Pending review from some code owners for PR linked to GEODE-9369: Command to copy region entries from a WAN site to another

2021-07-28 Thread Alberto Gomez
Hi,

The following PR https://github.com/apache/geode/pull/6601 has received the 
approval from several code owners but there are still some code owners' reviews 
pending.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,

Alberto


Request for review of PR: GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true

2021-07-28 Thread Alberto Gomez
Hi,

I would like to request the review of the following PR:

https://github.com/apache/geode/pull/6663 (GEODE-9408: Avoid duplicate events 
sent by Serial Gateway Sender when group-transaction-events is true).

Thanks in advance,

Alberto


Re: ParallelGatewaySenderQueue$BatchRemovalThread prints NPE

2021-07-09 Thread Alberto Gomez
I think I finally caught it:

https://github.com/apache/geode/pull/6683

Alberto

From: Kirk Lund 
Sent: Thursday, June 24, 2021 7:19 PM
To: dev@geode.apache.org 
Subject: ParallelGatewaySenderQueue$BatchRemovalThread prints NPE

Can someone who has been working on GatewaySender please take a look at
this NPE?

It didn't cause any test failures, but it does show up in the output of my
unit-test-openjdk11 task for one of my PRs:
https://concourse.apachegeode-ci.info/builds/53189

> Task :extensions:geode-modules-tomcat8:testClasses




*> Task :geode-web:testException in thread "BatchRemovalThread for
GatewaySender_sender_2" java.lang.NullPointerException at
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.checkCancelled(ParallelGatewaySenderQueue.java:1835)
at
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1936)*>
Task :extensions:geode-modules-tomcat9:compileTestJava
> Task :extensions:geode-modules-tomcat9:testClasses

Thanks,
Kirk


Questions about conserve-sockets and WAN replication

2021-07-06 Thread Alberto Gomez
Hi,

The Geode documentation states the following about conserve-sockets and WAN 
deployments in [1]:

"WAN deployments increase the messaging demands on a Geode system. To avoid 
hangs related to WAN messaging, always set `conserve-sockets=false` for Geode 
members that participate in a WAN deployment."

Could anyone please provide some more detailed information about why and where 
these hangs could happen? Is this a hard limitation or something to be 
considered under certain circumstances?

We have run into an unexpected situation which we wonder if it is related to 
the documentation statement above:

In a system like the following:
 - 2 WAN sites and 3 servers each
 - several partitioned regions with parallel senders
 - several replicated regions with serial senders
 - conserve-sockets set to true

We have sometimes observed, when trying to stop a parallel gateway sender while 
puts are being sent to both sites, that the thread stopping the gateway sender 
in one of the members gets stuck waiting to receive a reply from the other 
members (trying to get the size of the queue, see [2]). We see also other 
threads stuck, some trying to get a lock held by the stuck thread and others 
waiting in ReplyProcessor21.waitForRepliesUninterruptibly() trying to put or 
get data remotely (See [3] and [4]).
If we set conserve-sockets to false we do not experience any hang.

Could these stuck threads be related to what is stated in the documentation 
about WAN deployments and conserve-sockets set to true or should we rather 
think that it is an unrelated bug that needs to be solved?

Thanks in advance for your help,

Alberto

[1] 
https://geode.apache.org/docs/guide/113/managing/monitor_tune/sockets_and_gateways.html

[2]
"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread1" #1316 daemon 
prio=10 os_prio=0 cpu=18.86ms elapsed=1544.80s tid=0x7f92bc1c2000 
nid=0x2154 waiting on condition  [0x7f9179cd2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
- parking to wait for  <0x00031ca2be50> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
at 
org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
at 
org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
at 
org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
at 
org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
at 
org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
at 
java.util.concurrent.FutureTask.run(java.base@11.0.11/FutureTask.java:264)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
   

Re: [INFO] Apache Geode 1.14.0 Release Manager

2021-05-31 Thread Alberto Gomez
Hi Naba,

Can you please provide some information about how the 1.14 release is going and 
if is there any planned date for it?

Thanks in advance,

Alberto

From: Nabarun Nag 
Sent: Monday, March 22, 2021 5:27 PM
To: dev@geode.apache.org 
Subject: [INFO] Apache Geode 1.14.0 Release Manager

Hi everyone,

I hope you all are doing well. This is to inform the Apache Geode community 
that I will be volunteering as the Release Manager for 1.14.0 release. Thank 
you, Owen, for all the work that has been done to get the release to this point.

As for backporting, as a developer, you just need to create a PR against the 
support/1.14 branch, and you are done. As a release manager, I will take over 
from there.

Just ensure the following:

  *   The PR is a cherry-pick (cherry-pick -x) of a commit that is already in 
develop
  *   Ensure that there are no merge conflicts.

Regards
Nabarun Nag



Re: June Community Meeting

2021-05-31 Thread Alberto Gomez
Hi all,

Just a short note to inform you that we have added some more information in the 
Wiki page about the topic we plan to present in the meeting:


  *   Title: "Servers waiting indefinitely for a reply"
  *   Summary: If a packet is lost between servers, multiple threads get stuck 
and servers wait indefinitely for a reply, without any retry mechanism or 
timeout.
  *   Context: 
https://markmail.org/search/?q=list%3Aorg.apache.geode.dev+order%3Adate-backward#query:list%3Aorg.apache.geode.dev%20order%3Adate-backward+page:1+mid:l6uw5vs62vmtcxo4+state:results

Best regards,

Alberto

From: Alexander Murmann 
Sent: Wednesday, May 26, 2021 11:55 PM
To: geode 
Subject: June Community Meeting

Hi everyone,

Next Wednesday, June 2nd, it's time for our next Geode Community Meeting.

So far, our only agenda item is "geode retry/acknowledge improvement" presented 
by contributors from the team at Ericsson.

If you have anything you'd like to present or discuss, please add it to our 
agenda on the project 
Confluence.
 The same page also contains information on how to join.


Looking forward to seeing you all there!




Re: Question about the write-buffer-size parameter when creating a disk store

2021-05-24 Thread Alberto Gomez
Thanks, Darrel.

I have created a JIRA for this issue 
(https://issues.apache.org/jira/browse/GEODE-9300).

Alberto

From: Darrel Schneider 
Sent: Monday, May 24, 2021 6:24 PM
To: dev@geode.apache.org 
Subject: Re: Question about the write-buffer-size parameter when creating a 
disk store

I also could not find any code that used the write-buffer-size when allocating 
a buffer.
I did find this method: Oplog.allocateWriteBuf
It seems like this would be the allocation of the disk store write buffer.
If one is not already allocated then this method checks a sysprop and then 
defaults to 32k.

return ByteBuffer.allocateDirect(Integer.getInteger("WRITE_BUF_SIZE", 32768));

It seems like this code should at least also ask the DiskStoreImpl (using 
Oplog.parent) what the write buffer size is if the sys prop is not set.
____
From: Alberto Gomez 
Sent: Monday, May 24, 2021 8:51 AM
To: dev@geode.apache.org 
Subject: Question about the write-buffer-size parameter when creating a disk 
store

Hi,

According to the Geode documentation, it is possible to set the write buffer 
size by using --write-buffer-size when creating a disk store [1].

Nevertheless, looking at the code, I have not seen that setting a value for 
that parameter has any effect. Does anybody know if I am correct or if I am 
missing something?

Thanks in advance,

Alberto G.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Ftools_modules%2Fgfsh%2Fcommand-pages%2Fcreate.html&data=04%7C01%7Cdarrel%40vmware.com%7Cb63852a3a5f3443fba8608d91ecbda73%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637574683156060826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ws2jZCItSzOU31LjWpNVl55m4J74VrkII0apK8AfGQk%3D&reserved=0


Question about the write-buffer-size parameter when creating a disk store

2021-05-24 Thread Alberto Gomez
Hi,

According to the Geode documentation, it is possible to set the write buffer 
size by using --write-buffer-size when creating a disk store [1].

Nevertheless, looking at the code, I have not seen that setting a value for 
that parameter has any effect. Does anybody know if I am correct or if I am 
missing something?

Thanks in advance,

Alberto G.

[1] 
https://geode.apache.org/docs/guide/113/tools_modules/gfsh/command-pages/create.html


Re: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Alberto Gomez
Please, disregard my last e-mail.

I was having a parallel conversation by e-mail with Mario on this topic and 
sent the e-mail to the list by mistake.

BR,

Alberto

From: Alberto Gomez 
Sent: Wednesday, May 5, 2021 11:29 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto

From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.

Re: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Alberto Gomez
You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto

From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075&data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3D&reserved=0
>>
>> My question is, what is your estimation, how much effort/work is needed to 
>> implement message retry/acknowledge logic in geode,
>> to solve this problem?
>>
>> BR,
>> Mario
>



Re: DISCUSSION: Geode Native C++ 17 adoption

2021-05-04 Thread Alberto Gomez
Hi,

Here come my two cents.

To me, upgrading to C++17 is a no brainer given that C++11 is quite old and 
C++17 has lots of new features, performance improvements and bug fixes.

The only thing that could prevent us from doing so is having lots of users that 
are running the native client in a platform that does not have a C++17 
compiler. Which leads me to the question: should we move this discussion to the 
user mailing list?

Alberto

From: Mario Salazar de Torres 
Sent: Tuesday, May 4, 2021 7:45 PM
To: dev@geode.apache.org 
Subject: Re: DISCUSSION: Geode Native C++ 17 adoption

Hi everyone,

Sorry for the previous email, I did send it before finishing it by mistake.

Currently Geode Native uses C++11 standard. It has been quite some time since 
the standard was released and as of today the latest standard is C++20.
As part of another discussion, some users in the community were wondering if 
it's the time to switch to C++17 in the Geode Native project.

So, I am putting a list of pros and cons:

Pros:

  *   Several new features added:
 *   C++14 features: 
https://en.cppreference.com/w/cpp/14#New_language_features
 *   C++17 features: 
https://en.cppreference.com/w/cpp/17#New_language_features
  *   Some of the interesting features are:
 *   Function return type deduction.
 *   Improved constexpr functions.
 *   Variable templates.
 *   Generic lambdas.
 *   Lambda capture expressions.
 *   [[deprecated]]
 *   Shared mutexes/locks.
 *   std::make_unique
 *   Nested namespace definitions.
 *   Structured bindings.
 *   variant.
 *   any.
 *   optional.

Cons:
  *   Some users might have older compilers which does not implement all C++ 17 
features.

Thanks,
Mario.


From: Mario Salazar de Torres 
Sent: Tuesday, May 4, 2021 7:34 PM
To: dev@geode.apache.org 
Subject: DISCUSSION: Geode Native C++ 17 adoption

Hi everyone,

Currently Geode Native uses C++11 standard. It has been quite some time since 
the standard was released and as of today the latest standard is C++20.
As part of another discussion, some users in the community were wondering if 
it's the time to switch to C++17 in the Geode Native project.

So, I am putting a list of pros and cons:

Pros:

  *   Several new features added:
 *   C++14 features: 
https://en.cppreference.com/w/cpp/14#New_language_features
 *   C++17 features: 
https://en.cppreference.com/w/cpp/17#New_language_features
  *   Some of the interesting features are:
 *   Function return type deduction.
 *   Improved constexpr functions.
 *   Variable templates.
 - Generic lambdas.
 - Lambda capture expressions.
 - [[deprecated]]
 - Shared mutexes/locks.
 - std::make_unique
C++17 - 
cppreference.com
This page was last modified on 20 October 2020, at 04:39. This page has been 
accessed 106,431 times. Privacy policy; About cppreference.com; Disclaimers
en.cppreference.com



C++14 - 
cppreference.com
New language features . variable templates; generic lambdas; lambda 
init-capture new/delete elision relaxed restrictions on constexpr functions; 
binary literals
en.cppreference.com




[RFC PROPOSAL] Geode Command to replicate region data from one site to another connected via WAN

2021-04-22 Thread Alberto Gomez
Hi all,

In the following link you can find a proposal to introduce a Geode command to 
replicate region data between sites connected via WAN.

https://cwiki.apache.org/confluence/display/GEODE/Geode+Command+to+replicate+region+data+from+one+site+to+another+connected+via+WAN

As per RFC guidelines, please comment in this mail thread.

Thanks,

Alberto G.


Re: [DISCUSS] Monthly, synchronous community meetings

2021-04-09 Thread Alberto Gomez
Hi,

This idea sounds great to me.

Have you thought about the videoconference platform to host these meetings?

Best regards,

Alberto

From: Alexander Murmann 
Sent: Friday, April 9, 2021 2:46 AM
To: dev@geode.apache.org 
Subject: [DISCUSS] Monthly, synchronous community meetings

Hi everyone,

On occasion we see discussions on PRs, here on the mailing list that might be 
move much quicker if we chatted synchronously about some of these topics and 
then shared the meetings notes back to the mailing list. Of course, our usual 
processes for voting and additional discussions would still need to be just as 
accessible as they are right now on the mailing list to anyone who cannot 
attend a meeting. However, it might allow us to move these discussions along 
faster and also create a stronger sense of community.

I've seen similar things done in Kubernetes Special Interest Groups 
(example)

Meeting Specifics
Frequency: monthly
Time: It seems like most of our community is distributed between Europe and the 
US. 3pm/15:00 UTC (11amEDT/8amPST/17:00 CEST) seems like a good compromise.
How: We'd have a Confluence page to gather topics ahead of time. Topics should 
be added with as much lead time as possible, to allow interested community 
members to plan attendance. We'd use the same page to take meeting notes.
Topic examples: RFC discussions, process proposals (like the recent codeowner 
introduction), show & tells of recent changes, controversial PRs, the sky is 
the limit till we find certain topics are better in a dedicated meeting.

As with everything, I'd expect us to iterate and evolve this

Does this sound valuable to everyone? How could this be better?


Re: CODEWATCHERS file effects

2021-03-24 Thread Alberto Gomez
Thank you, Owen.

My bad. I had forgotten I had added myself also to the "Client/server messaging 
and cache operations" sections of the code which explains why I was added as 
reviewer to the PRs I did not expect.

Best regards,

Alberto

From: Owen Nichols 
Sent: Wednesday, March 24, 2021 11:39 AM
To: dev@geode.apache.org 
Subject: Re: CODEWATCHERS file effects

Thanks Alberto for the detailed list.  I think I was able to find explanations 
for all, see below.

> https://github.com/apache/geode/pull/6177
# 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
#matches your rule:
# geode-core/**/org/apache/geode/cache/client/**
#however, support branch PR should not have triggered CODEWATCHERS, I will fix.

> https://github.com/apache/geode/pull/6156
# 
geode-core/src/distributedTest/java/org/apache/geode/cache/client/internal/CacheServerSSLConnMaxThreadsDUnitTest.java
#matches your rule:
# geode-core/**/org/apache/geode/cache/client/**

> https://github.com/apache/geode/pull/6153
# geode-wan/build.gradle
#matches your rule:
# geode-wan/**
#note that in CODEOWNERS, this would not have matched, because gradle files are 
remapped by a later rule

> https://github.com/apache/geode/pull/6151
# 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
#matches your rule:
# geode-core/**/org/apache/geode/cache/client/**

> https://github.com/apache/geode/pull/6075
# 
geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/BaseCommand.java
#matches your rule:
# geode-core/**/org/apache/geode/internal/cache/tier/**

> https://github.com/apache/geode/pull/6116
#you are listed as a reviewer because you approved this PR.  You were never 
requested via CODEWATCHERS.


 
From: Owen Nichols 
Sent: Wednesday, March 24, 2021 2:30 AM
To: dev@geode.apache.org 
Subject: Re: CODEWATCHERS file effects

Hi Alberto, is there a specific PR you feel you were added to in error?  I 
spot-checked #6179 and there was one test change in geode-wan so that one seems 
correct.

I am looking for a solution to avoid adding watchers to draft PRs until 
they are taken out of draft mode, but it's non-trivial so I don't have an ETA 
yet.

    On 3/23/21, 12:08 PM, "Alberto Gomez"  wrote:

Hi,

I have recently added myself to the CODEWATCHERS file to be assigned as 
reviewer to PRs touching certain areas of the code but seems that I am being 
added to many more PRs that what I intended, even to Draft PRs.

Is anybody else experiencing the same?

Thanks,

Alberto




Re: CODEWATCHERS file effects

2021-03-24 Thread Alberto Gomez
Hi Owen,

Here are some PRs I feel I was added to in error:

https://github.com/apache/geode/pull/6177
https://github.com/apache/geode/pull/6156
https://github.com/apache/geode/pull/6153
https://github.com/apache/geode/pull/6151
https://github.com/apache/geode/pull/6075
https://github.com/apache/geode/pull/6116

Best regards,

Alberto


From: Owen Nichols 
Sent: Wednesday, March 24, 2021 2:30 AM
To: dev@geode.apache.org 
Subject: Re: CODEWATCHERS file effects

Hi Alberto, is there a specific PR you feel you were added to in error?  I 
spot-checked #6179 and there was one test change in geode-wan so that one seems 
correct.

I am looking for a solution to avoid adding watchers to draft PRs until they 
are taken out of draft mode, but it's non-trivial so I don't have an ETA yet.

On 3/23/21, 12:08 PM, "Alberto Gomez"  wrote:

Hi,

I have recently added myself to the CODEWATCHERS file to be assigned as 
reviewer to PRs touching certain areas of the code but seems that I am being 
added to many more PRs that what I intended, even to Draft PRs.

Is anybody else experiencing the same?

Thanks,

Alberto



CODEWATCHERS file effects

2021-03-23 Thread Alberto Gomez
Hi,

I have recently added myself to the CODEWATCHERS file to be assigned as 
reviewer to PRs touching certain areas of the code but seems that I am being 
added to many more PRs that what I intended, even to Draft PRs.

Is anybody else experiencing the same?

Thanks,

Alberto


[DISCUSS] CODEOWNERS mechanism feedback

2021-03-17 Thread Alberto Gomez
Hi,

It's been more than two months since the CODEOWNERS file has been in place to 
automatically add reviewers to pull requests. While we have seen the great 
benefit of having the experts in the matter being automatically assigned as 
reviewers to each pull request, I have the feeling that the review process is 
taking longer now. Some possible reasons could be:
1. Some code owners might be getting more reviews than they can cope with and 
they have become a bottleneck.
2. While prior to this change only two approvals were necessary, with the new 
process the number of approvals from reviewers required to approve a pull 
request can be much higher than two, depending on the number of areas touched 
by the PR.

Again, this might just be my feeling or something incidental and only related 
to the pull requests I have been working on. In any case, I would like to know 
if others are experiencing this slowdown in the review of their pull requests.

Also, I do not know if there are metrics available for the review process. For 
example, the average time taken since a pull request is submitted or a change 
is made on it until there is a review. Having these types of metrics would be 
very useful because they would allow us to evaluate this mechanism from 
perspectives other than the quality of the reviews and to propose corrective 
actions if necessary.

Best regards,

Alberto


Re: Question about closing of all connections towards an endpoint in C++ native client

2021-02-24 Thread Alberto Gomez
Thanks Jake. I totally agree with you.

Interestingly, that logic has been recently removed from the C++ client when we 
switched from ACE_SOCK to boost::asio so what I said in my previous e-mail 
pertained to the C++ client 1.13 version and older.

Alberto

From: Jacob Barrett 
Sent: Wednesday, February 24, 2021 4:24 PM
To: dev@geode.apache.org 
Subject: Re: Question about closing of all connections towards an endpoint in 
C++ native client

The Java client does the same thing under certain conditions. Neither of the 
clients should do this though. I think this model is way too overaggressive. I 
think we should remove that logic entirely. If we think we want something that 
proactively checks the other connections to that server we could have a 
background thread go through and send a ping request on one the next in the 
queue. If it doesn’t respond then terminate that connection. Continue until a 
pong response is received.

-Jake

> On Feb 24, 2021, at 4:36 AM, Alberto Gomez  wrote:
>
> Hi,
>
> Running some tests with the C++ native client and looking at the code, I have 
> observed that when an error in a connection towards an endpoint (timeout, IO 
> error) is detected, not only the faulty connection is closed but the endpoint 
> is set to "not connected" status which eventually provokes that all other 
> open connections towards that endpoint are closed when used.
>
> I have not seen that behavior in the Java client, i.e., the Java client, when 
> it detects an error in a connection towards an endpoint, it closes that 
> connection but does not act on other connections towards that endpoint.
>
> Are my observations correct?
>
> If so, shouldn't the C++ native client be aligned with the Java client?
>
> Thanks,
>
> Alberto G.



Question about closing of all connections towards an endpoint in C++ native client

2021-02-24 Thread Alberto Gomez
Hi,

Running some tests with the C++ native client and looking at the code, I have 
observed that when an error in a connection towards an endpoint (timeout, IO 
error) is detected, not only the faulty connection is closed but the endpoint 
is set to "not connected" status which eventually provokes that all other open 
connections towards that endpoint are closed when used.

I have not seen that behavior in the Java client, i.e., the Java client, when 
it detects an error in a connection towards an endpoint, it closes that 
connection but does not act on other connections towards that endpoint.

Are my observations correct?

If so, shouldn't the C++ native client be aligned with the Java client?

Thanks,

Alberto G.


Re: [DISCUSS] client/server communications and versioning

2021-02-23 Thread Alberto Gomez
+1

This proposal makes a lot of sense.

Besides, I recently sent a proposal to allow clients to communicate with 
servers in an older version in case the compatibility was not broken in the new 
version of the client ([1]). With your proposal, the aim of that RFC could also 
be achieved. Following the example you have added to the JIRA ticket, a client 
with version 1.17 would be able to communicate with servers with version 1.15 
or 1.16 given that the client server protocol for the client would be 1.15.

[1] 
https://cwiki.apache.org/confluence/display/GEODE/Add+option+to+allow+newer+Geode+clients+to+connect+to+older+Geode+servers

BR,

Alberto G.

From: Bruce Schuchardt 
Sent: Tuesday, February 23, 2021 6:38 PM
To: dev@geode.apache.org 
Subject: [DISCUSS] client/server communications and versioning

I’m considering a change in client/server communications that I would like 
feedback on.

We haven’t changed on-wire client/server communications since v1.8 yet we tie 
these communications to the current version.  The support/1.14 branch 
identifies clients as needing v1.14 for serialization/deserialization, for 
instance, even though nothing has changed in years.

If we put out a patch release, say v1.12.1, clients running that patch version 
cannot communicate with servers running v1.12.0.  They also can’t communicate 
with a server running v1.13.0 because that server doesn’t know anything about 
v1.12.1 and will reject the client.  To solve that problem we currently have to 
issue a new 1.13 release that knows about v1.12.1 and users have to roll their 
servers to the new v1.13.1.

I propose to change this so that the client’s on-wire version is decoupled from 
the “current version”.  A client can be running v1.14.0 but could use v1.8.0 as 
its protocol version for communications.

This would have an impact on contributors to the project.  If you need to 
change the client/server protocol version you will need to modify 
KnownVersion.java to specify the change, and should let everyone know about the 
change.

See https://issues.apache.org/jira/browse/GEODE-8963


Re: Question about Map indexes

2021-02-16 Thread Alberto Gomez
Hi again,

After investigating a bit more how Map indexes work, I have seen that they are 
also used to support queries that use "!=" as Jason pointed out.

I would have expected that queries using != or NOT would not make use of 
indexes as it is the common practice in databases ([1]) and also as the Geode 
documentation seems to suggest ([2]):

"Indexes are not used in expressions that contain NOT, so in a WHERE clause of 
a query, qty >= 10 could have an index on qty applied for efficiency. However, 
NOT(qty < 10) could not have the same index applied."

Could somebody please confirm or deny if what the documentation states above is 
true or false and also if the conclusion can also be extended to the use of the 
!= operator?

I also think that the documentation about indexes could be improved at least in 
two areas:

  *   Information about range indexes. While there is a section for the 
deprecated Hash Indexes, there is no specific section for Range Indexes.
  *   Information about Map indexes. The information about these indexes lacks 
a bit of detail. For example, how does the index work when the entry does not 
contain the Map field for which there is an index? How does it behave when the 
Map field does not have the key in the index? How does it behave when the key 
is null or when the value is null?

Does anyone have plans to extend the information about indexes in Geode?

Thanks,

Alberto G.

[1] 
https://stackoverflow.com/questions/1759476/database-index-not-used-if-the-where-criteria-is
[2] 
https://geode.apache.org/docs/guide/19/developing/query_index/indexing_guidelines.html


________
From: Alberto Gomez 
Sent: Saturday, February 13, 2021 5:40 PM
To: dev@geode.apache.org 
Subject: Re: Question about Map indexes

Jason, thanks for the help.

I added a new commit to the pull request that solves the issue without 
(apparently) breaking anything.

The problem was that when adding an index entry we need to distinguish between 
the case where the Map does not contain the key from the case where the Map 
contains the key but the value for the key is null. If we use Map.get() we get 
in both cases null but we should only add the index entry in the latter case 
(when the map contains the key but the value corresponding to it is null).

I am not particularly proud of the solution because I use of an arbitrary 
exception to be able to distinguish both cases. Anyway, could you please check 
if we are in the right direction?

Thanks,

Alberto



From: Jason Huynh 
Sent: Thursday, February 11, 2021 10:57 PM
To: dev@geode.apache.org 
Subject: Re: Question about Map indexes

Hi Alberto,

I haven't checked the PR yet, just read through the email.  The first thought 
that comes to mind is when someone does a != query.  The index still has to 
supply the correct answer to the query (all entries with null or undefined 
values possibly)

I'll try to think of other cases where it might matter.  There may be other 
ways to execute the query but it would probably take a bit of reworking.. (ill 
check your pr to see if this is already addressed.   Sorry if it is!)

-Jason

On 2/11/21, 8:28 AM, "Alberto Gomez"  wrote:

Hi,

We have observed that creating an index on a Map field causes the creation 
of an index entry for every entry created in the region containing the Map, no 
matter if the Map field contained the key used in the index.
Nevertheless, we would expect that only entries whose Map field contain the 
key used in the index would have the corresponding index entry. With this 
behavior, the memory consumed by the index could be much higher than needed 
depending on the percentage of entries whose Map field contained the key in the 
index.

---
Example:
We have a region with entries whose key type is a String and the value type 
is an object with a field called "field1" of Map type.

We expect to run queries on the region like the following:

SELECT * from /example-region1 p WHERE p.field1['mapkey1']=$1"

We create a Map index to speed up the above queries:

gfsh> create index --name=myIndex --expression="r.field1['mapkey1']" 
--region="/example-region1 r"

We do the following puts:
- Put entry with key="key1" and with value=
- Put entry with key="key2" and with value=

The observation is that Geode creates two index entries for each entry. For 
the first entry, the internal indexKey is "key1" and for the second one, the 
internal indexKey is null.

These are the stats shown by gfsh after doing the above puts:

gfsh>list indexes --with-stats=yes
Member Name |Member ID|   Re

Re: Question about Map indexes

2021-02-13 Thread Alberto Gomez
Jason, thanks for the help.

I added a new commit to the pull request that solves the issue without 
(apparently) breaking anything.

The problem was that when adding an index entry we need to distinguish between 
the case where the Map does not contain the key from the case where the Map 
contains the key but the value for the key is null. If we use Map.get() we get 
in both cases null but we should only add the index entry in the latter case 
(when the map contains the key but the value corresponding to it is null).

I am not particularly proud of the solution because I use of an arbitrary 
exception to be able to distinguish both cases. Anyway, could you please check 
if we are in the right direction?

Thanks,

Alberto



From: Jason Huynh 
Sent: Thursday, February 11, 2021 10:57 PM
To: dev@geode.apache.org 
Subject: Re: Question about Map indexes

Hi Alberto,

I haven't checked the PR yet, just read through the email.  The first thought 
that comes to mind is when someone does a != query.  The index still has to 
supply the correct answer to the query (all entries with null or undefined 
values possibly)

I'll try to think of other cases where it might matter.  There may be other 
ways to execute the query but it would probably take a bit of reworking.. (ill 
check your pr to see if this is already addressed.   Sorry if it is!)

-Jason

On 2/11/21, 8:28 AM, "Alberto Gomez"  wrote:

Hi,

We have observed that creating an index on a Map field causes the creation 
of an index entry for every entry created in the region containing the Map, no 
matter if the Map field contained the key used in the index.
Nevertheless, we would expect that only entries whose Map field contain the 
key used in the index would have the corresponding index entry. With this 
behavior, the memory consumed by the index could be much higher than needed 
depending on the percentage of entries whose Map field contained the key in the 
index.

---
Example:
We have a region with entries whose key type is a String and the value type 
is an object with a field called "field1" of Map type.

We expect to run queries on the region like the following:

SELECT * from /example-region1 p WHERE p.field1['mapkey1']=$1"

We create a Map index to speed up the above queries:

gfsh> create index --name=myIndex --expression="r.field1['mapkey1']" 
--region="/example-region1 r"

We do the following puts:
- Put entry with key="key1" and with value=
- Put entry with key="key2" and with value=

The observation is that Geode creates two index entries for each entry. For 
the first entry, the internal indexKey is "key1" and for the second one, the 
internal indexKey is null.

These are the stats shown by gfsh after doing the above puts:

gfsh>list indexes --with-stats=yes
Member Name |Member ID|   Region Path|  
 Name   | Type  | Indexed Expression  |From Clause | Valid Index | Uses 
| Updates | Update Time | Keys | Values
--- | --- |  | 
 | - | - | -- | 
--- |  | --- | --- |  | --
server1 | 192.168.0.26(server1:1109606):41000 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 1   
 | 1   | 0   | 1| 1
server2 | 192.168.0.26(server2:1109695):41001 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 1   
 | 1   | 0   | 1| 1
---

Is there any reason why Geode would create an index entry for the second 
entry given that the Map field does not contain the key in the Map index?

I have created a draft pull request changing the behavior of Geode to not 
create the index entry when the Map field does not contain the key used in the 
index. Only two Unit test cases had to be adjusted. Please see: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6028&data=04%7C01%7Cjhuynh%40vmware.com%7C0957cc0ef91b4b23116408d8ceaa0a8d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637486577011301177%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2WDUj6NPEnfX3AXH72MTZYx%2FbXMPJQlVZeKq7KsJDTw%3D&reserved=0

With this change and the same scenario as the one in the example, only one 
index entry is created. The stats shown by gfsh after the change are the 
following:

gfsh>list indexes --with-stats=yes
Member Name |Member ID 

Question about Map indexes

2021-02-11 Thread Alberto Gomez
Hi,

We have observed that creating an index on a Map field causes the creation of 
an index entry for every entry created in the region containing the Map, no 
matter if the Map field contained the key used in the index.
Nevertheless, we would expect that only entries whose Map field contain the key 
used in the index would have the corresponding index entry. With this behavior, 
the memory consumed by the index could be much higher than needed depending on 
the percentage of entries whose Map field contained the key in the index.

---
Example:
We have a region with entries whose key type is a String and the value type is 
an object with a field called "field1" of Map type.

We expect to run queries on the region like the following:

SELECT * from /example-region1 p WHERE p.field1['mapkey1']=$1"

We create a Map index to speed up the above queries:

gfsh> create index --name=myIndex --expression="r.field1['mapkey1']" 
--region="/example-region1 r"

We do the following puts:
- Put entry with key="key1" and with value=
- Put entry with key="key2" and with value=

The observation is that Geode creates two index entries for each entry. For the 
first entry, the internal indexKey is "key1" and for the second one, the 
internal indexKey is null.

These are the stats shown by gfsh after doing the above puts:

gfsh>list indexes --with-stats=yes
Member Name |Member ID|   Region Path|   
Name   | Type  | Indexed Expression  |From Clause | Valid Index | Uses 
| Updates | Update Time | Keys | Values
--- | --- |  | 
 | - | - | -- | 
--- |  | --- | --- |  | --
server1 | 192.168.0.26(server1:1109606):41000 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 1   
 | 1   | 0   | 1| 1
server2 | 192.168.0.26(server2:1109695):41001 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 1   
 | 1   | 0   | 1| 1
---

Is there any reason why Geode would create an index entry for the second entry 
given that the Map field does not contain the key in the Map index?

I have created a draft pull request changing the behavior of Geode to not 
create the index entry when the Map field does not contain the key used in the 
index. Only two Unit test cases had to be adjusted. Please see: 
https://github.com/apache/geode/pull/6028

With this change and the same scenario as the one in the example, only one 
index entry is created. The stats shown by gfsh after the change are the 
following:

gfsh>list indexes --with-stats=yes
Member Name |Member ID|   Region Path|   
Name   | Type  | Indexed Expression  |From Clause | Valid Index | Uses 
| Updates | Update Time | Keys | Values
--- | --- |  | 
 | - | - | -- | 
--- |  | --- | --- |  | --
server1 | 192.168.0.26(server1:1102192):41000 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 2   
 | 1   | 0   | 0| 0
server2 | 192.168.0.26(server2:1102279):41001 | /example-region1 | 
mapIndex | RANGE | r.field1['mapkey1'] | /example-region1 r | true| 2   
 | 1   | 0   | 1| 1


Could someone tell if the current behavior is not correct or if I am missing 
something and with the change I am proposing something else will stop working?

Thanks in advance,

/Alberto G.


Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect to older Geode servers

2021-02-04 Thread Alberto Gomez
After thinking about the pros and cons of the RFC solution and the 
alternatives, taking into account the comments received, I have decided to move 
it to the Icebox and not implement it for the time being.

Thanks all for the valuable feedback.

Best regards,

Alberto G.

From: Bruce Schuchardt 
Sent: Tuesday, February 2, 2021 5:41 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

Oh, but I forgot about WAN changes that may have been made to the handshake to 
allow different versions in different clusters.  Jake might be right about this.

On 2/2/21, 8:31 AM, "Bruce Schuchardt"  wrote:

I think it's only the locator connections that do this.  Regular 
client->server connections using the handshake code just send the client's 
current version, which must not be newer than the server's version.

On 2/1/21, 9:53 AM, "Jacob Barrett"  wrote:

Having just spent some time yanking out some of the really really old 
version support I think a naive version knocking approach would work. During 
the client handshake the server will reject and close the connection of any 
client with a newer version number than it supports. The client could use this 
as signal to downgrade its version and try again. This could continue until the 
server accepts the client. We would need to decide if we would expect the 
entire membership to be a the same versions or if the version knocking needs to 
be on a per member basis. Obviously knocking for every connection is not ideal 
so some sort heuristic should be maintained for the life of the client.

Interestingly enough the clients sort of did this up until the merge of 
this version cleanups. All clients first made connections using the very old 
protocol version so that the server would send its version back. Then the 
client would disconnect and reconnect using its current version. The same could 
be done today with the current protocol version, the clients could make first 
connection with v1.0.0, get the server version, close and reconnect identifying 
themselves at the same server version.

-Jake


On Jan 29, 2021, at 3:35 PM, Dan Smith 
mailto:dasm...@vmware.com>> wrote:

Well, I have no objection to adding a system property for this if you 
want to try it. Since those properties aren't technically part of the public 
API I don't think we need to offer full support for what happens when the 
setting breaks. I'm just thinking ahead to what will happen when the protocol 
does change. At that point setting the system property will not work, unless 
the client has the capability to negotiate and discover the server version and 
use the old protocol the way that WAN does.

Do keep in mind that failures may not be obvious if the serialization 
protocol changes and your client is pretending to be a different version. I 
think it's possible that the errors might show up only in log messages or 
corrupted values, and only if you are using whatever features are affected by a 
protocol change.

-Dan

From: Alberto Gomez 
mailto:alberto.go...@est.tech>>
Sent: Friday, January 29, 2021 11:40 AM
To: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to 
connect to older Geode servers

Hi Dan,

Thanks a lot for your comments.

The scope of the RFC is not very ambitious. As I pointed out in it, the 
idea is not to implement the backward compatibility of clients with older 
servers. Rather, the aim is to allow to take advantage of the fact that 
serialization or other types of changes that may break this compatibility are 
not very frequent. For those cases where there have been no incompatible 
changes, with one of the proposed System Properties, it would be possible for a 
client to communicate with an older compatible server without the need of 
implementing anything extra. And we would have the test cases in place to 
assure this. For those cases where compatibility has been broken, it will not 
be possible to communicate the client with the older server and we would also 
have the tests showing that this communication is not possible even if the 
proposed System Property is used.

I do not know how costly it would be to implement and maintain the 
alternative approach you suggest with the negotiation required to support full 
backward compatibility. I would leave that to a different RFC. The good thing 
is that the current RFC could serve as a first step to implement the second, if 
it is agreed that this second feature is worth of being put in Geode.

Best regards,

Alberto

Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect to older Geode servers

2021-01-29 Thread Alberto Gomez
Hi Dan,

Thanks a lot for your comments.

The scope of the RFC is not very ambitious. As I pointed out in it, the idea is 
not to implement the backward compatibility of clients with older servers. 
Rather, the aim is to allow to take advantage of the fact that serialization or 
other types of changes that may break this compatibility are not very frequent. 
For those cases where there have been no incompatible changes, with one of the 
proposed System Properties, it would be possible for a client to communicate 
with an older compatible server without the need of implementing anything 
extra. And we would have the test cases in place to assure this. For those 
cases where compatibility has been broken, it will not be possible to 
communicate the client with the older server and we would also have the tests 
showing that this communication is not possible even if the proposed System 
Property is used.

I do not know how costly it would be to implement and maintain the alternative 
approach you suggest with the negotiation required to support full backward 
compatibility. I would leave that to a different RFC. The good thing is that 
the current RFC could serve as a first step to implement the second, if it is 
agreed that this second feature is worth of being put in Geode.

Best regards,

Alberto

From: Dan Smith 
Sent: Friday, January 29, 2021 1:56 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

I think just sending the old version will only work until we actually make any 
changes to the protocol. Once we do, serialization will break unless we also 
change the client to pretend to be that old version, including the way it 
serializes and deserializes messages. With this proposal there will be no way 
for the client to use new features with a newer server since the version number 
of the client is set with a system property.

An alternative would be to have the client and the server need a way to 
negotiate which protocol they are going to communicate over. We do this already 
for WAN. WAN senders can be a higher version than receivers, otherwise we 
couldn't upgrade an Active/Active WAN. What happens is that the WAN receiver 
will accept a newer versioned client, and it sends back its own version. The 
client reads the receivers version and adjusts accordingly. You can see this in 
ClientSideHandshakeImpl.handshakeWithServer.

This will require a lot of testing to make sure that users won't see strange 
corruption related errors related to serialization changes.

-Dan
____
From: Alberto Gomez 
Sent: Tuesday, January 26, 2021 6:45 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

Hi,

I have updated the proposal in the RFC by adding Patrick's suggestion (if I 
have understood it correctly).

Best regards,

Alberto
____
From: Alberto Gomez 
Sent: Friday, January 22, 2021 10:41 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

Thanks for your comments, Patrick.

Do you mean have the client always use in the handshake the oldest server 
version it is compatible with?

Sounds like a reasonable simplification. In that case, I would use a flag to 
activate this behavior so that the current behavior (the client sends the 
current version in the handshake) is kept when the flag is not used.

On the other hand if in the future we have clients that are partially 
compatible with an older server version, the System Property with the version 
could allow these clients to connect to that server version assuming that they 
will not use any incompatible feature.

Alberto



From: Patrick Johnson 
Sent: Friday, January 22, 2021 8:35 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

It sounds like you intend to test which versions are compatible with each other 
and maintain a list the client can use to reject the setting of force-version 
when set to an incompatible version. If that’s the case, why not just have the 
handshake look at that list and automatically connect with any versions that it 
is known to be compatible with? Then you wouldn’t even have to set the property.

> On Jan 22, 2021, at 11:05 AM, Alberto Gomez  wrote:
>
> Hi Geode devs,
>
> I have just published the following RFC in the Geode wiki: "Add option to 
> allow newer Geode clients to connect to older Geode servers"
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAdd%2Boption%2Bto%2Ballow%2Bnewer%2BGeode%2Bclients%2Bto%2Bconnect%2Bto%2Bolder%2BGeode%2Bservers&data=04%7C01%7Cda

Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect to older Geode servers

2021-01-26 Thread Alberto Gomez
Hi,

I have updated the proposal in the RFC by adding Patrick's suggestion (if I 
have understood it correctly).

Best regards,

Alberto

From: Alberto Gomez 
Sent: Friday, January 22, 2021 10:41 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

Thanks for your comments, Patrick.

Do you mean have the client always use in the handshake the oldest server 
version it is compatible with?

Sounds like a reasonable simplification. In that case, I would use a flag to 
activate this behavior so that the current behavior (the client sends the 
current version in the handshake) is kept when the flag is not used.

On the other hand if in the future we have clients that are partially 
compatible with an older server version, the System Property with the version 
could allow these clients to connect to that server version assuming that they 
will not use any incompatible feature.

Alberto



From: Patrick Johnson 
Sent: Friday, January 22, 2021 8:35 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

It sounds like you intend to test which versions are compatible with each other 
and maintain a list the client can use to reject the setting of force-version 
when set to an incompatible version. If that’s the case, why not just have the 
handshake look at that list and automatically connect with any versions that it 
is known to be compatible with? Then you wouldn’t even have to set the property.

> On Jan 22, 2021, at 11:05 AM, Alberto Gomez  wrote:
>
> Hi Geode devs,
>
> I have just published the following RFC in the Geode wiki: "Add option to 
> allow newer Geode clients to connect to older Geode servers"
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAdd%2Boption%2Bto%2Ballow%2Bnewer%2BGeode%2Bclients%2Bto%2Bconnect%2Bto%2Bolder%2BGeode%2Bservers&data=04%7C01%7Cjpatrick%40vmware.com%7C13575e2f7095498aaf0608d8bf08be8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637469391602573044%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fgNljW8GTiY3FfVSsnAIe943XHpnMRjLZKSDzmf5Fpk%3D&reserved=0
>
> Could you please provide feedback by Tuesday, February 2nd, 2021?
>
> Thanks,
>
> Alberto G.
>



Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect to older Geode servers

2021-01-22 Thread Alberto Gomez
Thanks for your comments, Patrick.

Do you mean have the client always use in the handshake the oldest server 
version it is compatible with?

Sounds like a reasonable simplification. In that case, I would use a flag to 
activate this behavior so that the current behavior (the client sends the 
current version in the handshake) is kept when the flag is not used.

On the other hand if in the future we have clients that are partially 
compatible with an older server version, the System Property with the version 
could allow these clients to connect to that server version assuming that they 
will not use any incompatible feature.

Alberto



From: Patrick Johnson 
Sent: Friday, January 22, 2021 8:35 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Add option to allow newer Geode clients to connect 
to older Geode servers

It sounds like you intend to test which versions are compatible with each other 
and maintain a list the client can use to reject the setting of force-version 
when set to an incompatible version. If that’s the case, why not just have the 
handshake look at that list and automatically connect with any versions that it 
is known to be compatible with? Then you wouldn’t even have to set the property.

> On Jan 22, 2021, at 11:05 AM, Alberto Gomez  wrote:
>
> Hi Geode devs,
>
> I have just published the following RFC in the Geode wiki: "Add option to 
> allow newer Geode clients to connect to older Geode servers"
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAdd%2Boption%2Bto%2Ballow%2Bnewer%2BGeode%2Bclients%2Bto%2Bconnect%2Bto%2Bolder%2BGeode%2Bservers&data=04%7C01%7Cjpatrick%40vmware.com%7C13575e2f7095498aaf0608d8bf08be8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637469391602573044%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fgNljW8GTiY3FfVSsnAIe943XHpnMRjLZKSDzmf5Fpk%3D&reserved=0
>
> Could you please provide feedback by Tuesday, February 2nd, 2021?
>
> Thanks,
>
> Alberto G.
>



[DISCUSS] RFC - Add option to allow newer Geode clients to connect to older Geode servers

2021-01-22 Thread Alberto Gomez
Hi Geode devs,

I have just published the following RFC in the Geode wiki: "Add option to allow 
newer Geode clients to connect to older Geode servers"

https://cwiki.apache.org/confluence/display/GEODE/Add+option+to+allow+newer+Geode+clients+to+connect+to+older+Geode+servers

Could you please provide feedback by Tuesday, February 2nd, 2021?

Thanks,

Alberto G.



Re: Question about -Dgemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION

2020-12-11 Thread Alberto Gomez
Barry, thanks a lot for the clarifications.

Best regards,

/Alberto G.

From: Barrett Oglesby 
Sent: Thursday, December 10, 2020 8:20 PM
To: dev@geode.apache.org 
Subject: Re: Question about 
-Dgemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION

Alberto,

There are a lot of applications that use this property, so I wouldn't expect it 
to be removed.

Here is some additional detail regarding this property:

The default behavior is if a sender in one site can connect successfully to a 
receiver in another and send batches to it, the acks received for those batches 
cause them to be removed from the sender queue. It doesn't matter what happens 
on that remote site. Every event could fail; every event could succeed. If an 
ack is received, the batch is removed.

The gemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION system property 
reverses that behavior and keep batches on the sender queue until all their 
events are applied successfully by the receiver in the remote site.

The default behavior can cause sites to be out of sync, and the system property 
behavior can cause infinite retries and OOM in the sender if the receiver in 
the remote site never successfully processes a batch.

There is a draft proposal for a callback that is a middle ground between these 
two behaviors, but it hasn't been implemented at this point.

Barry
____
From: Alberto Gomez 
Sent: Thursday, December 10, 2020 7:58 AM
To: dev@geode.apache.org 
Subject: Question about -Dgemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION

Hi,

I have recently discovered the 
"gemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION" Geode System property 
that allows to change the default behavior of Gateway Senders so that when an 
exception occurs when handling an event, instead of proceeding with the rest of 
events in the batch and sending back to the sender an exception for the event 
(which is ignored), the Gateway Receiver sleeps for half a second and retries 
to apply the event until it succeeds.

As this property is not in the documentation (as far as I know) and it can only 
be activated by using the above property when starting the servers hosting 
gateway senders, I would like to know if it is safe to use, i.e. it will not be 
removed in the future and also if there are any considerations to make when 
using it, given that it is not Geode's default behavior.

I have noticed that there are test cases specifically testing this property 
(See KeepEventsOnGatewaySenderQueueDUnitTest.hava).

Thanks in advance,

/Alberto G.


Question about -Dgemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION

2020-12-10 Thread Alberto Gomez
Hi,

I have recently discovered the 
"gemfire.GatewaySender.REMOVE_FROM_QUEUE_ON_EXCEPTION" Geode System property 
that allows to change the default behavior of Gateway Senders so that when an 
exception occurs when handling an event, instead of proceeding with the rest of 
events in the batch and sending back to the sender an exception for the event 
(which is ignored), the Gateway Receiver sleeps for half a second and retries 
to apply the event until it succeeds.

As this property is not in the documentation (as far as I know) and it can only 
be activated by using the above property when starting the servers hosting 
gateway senders, I would like to know if it is safe to use, i.e. it will not be 
removed in the future and also if there are any considerations to make when 
using it, given that it is not Geode's default behavior.

I have noticed that there are test cases specifically testing this property 
(See KeepEventsOnGatewaySenderQueueDUnitTest.hava).

Thanks in advance,

/Alberto G.


Review for GEODE-8765: Fix NullPointerException when group-transaction-events and events in and not in transactions are sent.

2020-12-10 Thread Alberto Gomez
Hi,

Could I get some reviewers for PR: https://github.com/apache/geode/pull/5829

Thanks in advance,

/Alberto G.


Re: apache-geode-1.13.0.tgz not found in LGTM analysis

2020-11-19 Thread Alberto Gomez
Thanks for the info, Owen.

I have created a JIRA and a PR to update the .lgtm.yml file in the geode-native 
repo: https://github.com/apache/geode-native/pull/698

Any volunteer to review it?

BR,

Alberto

From: Owen Nichols 
Sent: Thursday, November 19, 2020 11:50 AM
To: dev@geode.apache.org 
Subject: Re: apache-geode-1.13.0.tgz not found in LGTM analysis

It looks like it was hardcoded[1] that way recently.  Geode 1.13.1 was just 
announced[2] so you are correct, 1.13.0 is archived and no longer on the 
mirrors.

If maintaining a hardcoded Geode version number in geode-native is necessary, 
the set_versions[3] script should be updated to keep it in sync.

[1] https://github.com/apache/geode-native/blame/develop/.lgtm.yml
[2] 
https://lists.apache.org/x/thread.html/rf937beb3783dc7f2e27a2618586d8cacd8b231793cccab863f4632e3@%3Cdev.geode.apache.org%3E
[3] 
https://github.com/apache/geode/blob/develop/dev-tools/release/set_versions.sh

-Owen

From: Alberto Gomez 
Date: Thursday, November 19, 2020 at 2:39 AM
To: dev@geode.apache.org 
Subject: apache-geode-1.13.0.tgz not found in LGTM analysis
Hi,

I am getting the following error in the LGTM analysis of some pull requests 
since yesterday (for example 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F690&data=04%7C01%7Conichols%40vmware.com%7C67b7059e6f8945508ddc08d88c7751f0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413791402422145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HY8tdLeIfM20hv7PLX%2B%2BYcStTD%2Fq334X7UXF6umRVL8%3D&reserved=0):

[2020-11-19 07:25:41] [build-err] + wget -O apache-geode.tgz 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmirror.transip.net%2Fapache%2Fgeode%2F1.13.0%2Fapache-geode-1.13.0.tgz&data=04%7C01%7Conichols%40vmware.com%7C67b7059e6f8945508ddc08d88c7751f0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413791402422145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mGRuhIujL%2F%2FRo4EciDy3sC7uLwJJiR7UYDTMEJGDf%2BA%3D&reserved=0
[2020-11-19 07:25:41] [build-err] --2020-11-19 07:25:41--  
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmirror.transip.net%2Fapache%2Fgeode%2F1.13.0%2Fapache-geode-1.13.0.tgz&data=04%7C01%7Conichols%40vmware.com%7C67b7059e6f8945508ddc08d88c7751f0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413791402422145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mGRuhIujL%2F%2FRo4EciDy3sC7uLwJJiR7UYDTMEJGDf%2BA%3D&reserved=0
[2020-11-19 07:25:41] [build-err] Resolving mirror.transip.net 
(mirror.transip.net)... 149.210.210.109, 2a01:7c8:1337::100
[2020-11-19 07:25:41] [build-err] Connecting to mirror.transip.net 
(mirror.transip.net)|149.210.210.109|:80... connected.
[2020-11-19 07:25:41] [build-err] HTTP request sent, awaiting response... 404 
Not Found

It seems the tgz file is not available anymore.

Any idea how to fix it?

Thanks,

/Alberto G.


apache-geode-1.13.0.tgz not found in LGTM analysis

2020-11-19 Thread Alberto Gomez
Hi,

I am getting the following error in the LGTM analysis of some pull requests 
since yesterday (for example https://github.com/apache/geode-native/pull/690):

[2020-11-19 07:25:41] [build-err] + wget -O apache-geode.tgz 
http://mirror.transip.net/apache/geode/1.13.0/apache-geode-1.13.0.tgz
[2020-11-19 07:25:41] [build-err] --2020-11-19 07:25:41--  
http://mirror.transip.net/apache/geode/1.13.0/apache-geode-1.13.0.tgz
[2020-11-19 07:25:41] [build-err] Resolving mirror.transip.net 
(mirror.transip.net)... 149.210.210.109, 2a01:7c8:1337::100
[2020-11-19 07:25:41] [build-err] Connecting to mirror.transip.net 
(mirror.transip.net)|149.210.210.109|:80... connected.
[2020-11-19 07:25:41] [build-err] HTTP request sent, awaiting response... 404 
Not Found

It seems the tgz file is not available anymore.

Any idea how to fix it?

Thanks,

/Alberto G.


Review for "C++ native client Function.execute() with onServers does not throw exception if one of the servers goes down while executing the function."

2020-11-16 Thread Alberto Gomez
Hi,

Could somebody review PR https://github.com/apache/geode-native/pull/690  
(https://issues.apache.org/jira/browse/GEODE-8693?filter=-2).

Thanks,

/Alberto G.


Re: Please review and contribute: draft of Nov 2020 Apache board report

2020-11-10 Thread Alberto Gomez
Hi Karen,

According to the membership data I'd say the Committer-to-PMC ratio is closer 
to 2:1 than to 7:4.

Alberto

From: Karen Miller 
Sent: Monday, November 9, 2020 8:25 PM
To: dev@geode.apache.org 
Subject: Please review and contribute: draft of Nov 2020 Apache board report

All, our board report is due in less than 48 hours.  I've included a
first draft below.
Please help correct my mistakes and let me know of blog posts and presentations
that are not yet on the list.

I think that we might also add to our Project Activity category mentions of the
community's focus.  Would it be a good idea to add something like this?
  - We're actively working on v1.13.1, which will contain many bug fixes.

Since the report is due quite soon, please get corrections/additions
to me before
Tuesday Nov 10 (tomorrow) at 3pm Pacific time.
Thanks.
Karen Miller
I work for VMware.
This email is written in my capacity as Chair of the Apache Geode PMC.


## Description:
The mission of Apache Geode is the creation and maintenance of software related
to a data management platform that provides real-time, consistent access to
data-intensive applications throughout widely distributed cloud architectures.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Geode was founded 2016-11-15 (4 years ago)
There are currently 110 committers and 54 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Alexander Murmann on 2020-03-26.
- Sarah Abbey was added as committer on 2020-09-29

## Project Activity:
- Apache Geode v1.13.0 was released on 2020-09-10.
- 10 community members presented 8 talks at ApacheCon 2020. See
  https://www.youtube.com/playlist?list=PLU2OcwpQkYCxKxd7dVETcwEtx5AEDIp1j for
  a playlist that includes all 8 talks.

## Community Health:
- 259 issues opened in JIRA, past quarter (-15% decrease)
- 212 issues closed in JIRA, past quarter (-16% decrease)
- 463 commits in the past quarter (-19% decrease)
- 57 code contributors in the past quarter (-6% decrease)
- 324 PRs opened on GitHub, past quarter (-16% decrease)
- 325 PRs closed on GitHub, past quarter (-14% decrease)


Re: PR process and etiquette

2020-10-29 Thread Alberto Gomez
Hi there,

Here come my 2 cents.

@Udo Kohlmeyer, thanks for your proposals to make this 
community better, and also for your willingness to get feedback from people who 
are new to the community.

In my experience, one of the tricky parts working in the community is getting 
reviewers for PRs. It is a bit of a mystery what will happen once you submit a 
PR. Sometimes you get a review in a few hours. Sometimes you do not and ask in 
the list for reviewers and after that, sometimes you get reviewers soon, and 
sometimes you don't, and you need to insist in the list. I have sometimes asked 
for a review to a particular person via e-mail as I do not have permissions to 
assign reviewers to PRs and sometimes have not received any answer.
I figure the response time is very dependent on how busy people. Anyhow, it is 
the uncertainty of what is going on behind the scenes what makes things hard.
If any proposal makes the review process more predictable, I am up for it. I 
think Udo's reflections to come up with a consistent approach to the review 
process are very valuable.

In my opinion, it is a good idea to submit draft PRs while we do not have the 
green light from the CI. I have many times submitted PRs, gone to sleep just to 
realize the morning after that some test cases in the CI failed (either due to 
flaky test cases or due to my changes). Sometimes I had already gotten a review 
and I would have preferred to have it once the CI was clear. Other times I did 
not get a review and I wondered if that (or those) failures would keep 
reviewers away from my PR given that they once looked at it and had test cases 
failures.

Alberto G.

From: Udo Kohlmeyer 
Sent: Thursday, October 29, 2020 1:50 AM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

So far I would like to thank everyone for their thoughts and input.

@Dave, I would love to find a solution to the partial sign-off. I’ve been 
experimenting with the “Projects” setting. I wonder if we cannot have a 
“Documentation Check” project, that is added to every PR as a default project. 
We could have different states with the project, which would allow the docs 
folk to know what PRs are new and which still need to be reviewed for docs 
changes.

Now, I don’t know if we can restrict the merging of a PR based upon a state in 
the Project, but at the very least it will provide the ability to have an 
overview of PR with/without docs review. You can have a look at the “Quality 
Review” project I have created. Which I use to track all PRs that I would like 
to review for quality purposes. (code, structure, tests, etc)… I think Docs 
could have something similar.

@Bruce, I’m not trying to create another rule for the sake of creating a rule. 
Why do you believe that we as a community will give any submitter a stink-eye 
just because they did not submit a draft? I certainly would not. I would 
suggest that the submitter maybe submit a draft IF the PR is not in a ready 
state and needs a few more iterations to get to a ready state.

I believe it is easier and better for committers to go through a list of PRs to 
review if they know that the PR passes all of the testing checks.. As a failure 
in one area might actually cause some code components to change. Which might 
void an earlier review of the code. Also, I’m not suggesting that there are no 
reviews before the commit checks go green. You can easily request someone else 
to review whilst in a draft state.

As for knowing what reviewers to tag for a review is more limiting. How would I 
as a new PR submitter know WHO I should tag in the PR? Over time we have built 
up a great understanding of who might be a good person to review our code. But 
for a new community member, they do not know this. For them, they submit the 
PR, and someone in the community will review it.

I would also like everyone to think back on their own approach on deciding what 
PRs to review.

Do you look at the PR and decide to wait until all commit checks are green?
Do you go through the list and find one, that you think you can review, whilst 
the commit checks are still running?
Do you only review PRs in which you have been explicitly tagged?
Do you scan the PRs for a commit in an area of “expertise”?
Do you scan the PRs for committers that you know?

Whatever approach we take, I would like us to come up with an approach, that we 
as a community follow, to have a consistent approach to the review.
A consistent way we can evaluate if the code is in a “ready” state?
A consistent way, that the community will know, that when they submit the PR it 
will be looked at.
A consistent way that I, as a committer, will know that if I spend the time to 
review the PR will not be a waste of my time, because it wasn’t ready.

I don’t think community members are repulsed by a project with structure, but I 
do know that I question a project without structure and one where it takes a 
l

Re: PR process and etiquette

2020-10-28 Thread Alberto Gomez
+1 to draft PRs.

By the way @Blake Bender, it's me the one having the 
draft PR for GEODE-8318.

Alberto G.

From: Blake Bender 
Sent: Wednesday, October 28, 2020 2:28 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

+1 for draft PRs.  Native has been using these for a few months now, and 
they're quite effective.  Right now, for example, we have 6 PRs up, 3 of which 
are draft.  They also turn out to be a convenient way to share work, in certain 
circumstances.  Mario, for instance, has a draft up for GEODE-8318 that is 
strictly WIP.  By having it up as a draft PR, I get notifications when changes 
are pushed, and can run internal tooling and let him know if I find issues.

Thanks,

Blake


On 10/28/20, 1:03 AM, "Udo Kohlmeyer"  wrote:

Great information Darrel. Thank you for sharing that.

--Udo

From: Darrel Schneider 
Date: Wednesday, October 28, 2020 at 3:32 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette
+1 to your idea of using "draft" mode until things are green. Something to 
be aware of is that if your pr branch has conflicts and it is in draft mode 
then your pr tests will not run and the pr page will not tell you that 
conflicts exist. If you see that the pr tests are not actually running and it 
is in draft mode then try merging develop to your pr branch and resolve the 
conflicts.

From: Owen Nichols 
Sent: Tuesday, October 27, 2020 6:03 PM
To: dev@geode.apache.org 
Subject: Re: PR process and etiquette

+1 for using GitHub's draft status to indicate work-in-progress.

Many great suggestions here, however I generally prefer that we don't 
squash commits at any point except the final Squash and Merge to develop.  I 
find it insightful to see how the work evolved.  I also find that review 
comments may start coming in even before you are "ready" for review, and a 
squash or force-push "loses" those comments.

One thing I would like to see more of is PR summaries that explain *why* 
the change is being made, not just *what* is being changed.

Thanks Udo for looking for ways to make the community process work even 
better!

On 10/27/20, 5:41 PM, "Udo Kohlmeyer"  wrote:

Dear Apache Geode Devs,
It is really great going through all the PRs that been submitted. As 
Josh Long is known to say: "I work for PRs".
Whilst going through some of the PRs I do see that there are many PRs 
that have multiple commits against the PR.
I know that the PR submission framework kicks off more testing than we 
do on our own local machines. It is extremely uncommon to submit a PR the first 
time and have all tests go green. Which means we invariably iterate over 
commits to make the build go green.
In this limbo time period, it is hard for any reviewer to know when the 
ticket is ready to be reviewed.
I want to propose that when submitting a PR, it is initially submitted 
as a DRAFT PR. This way, all test can still be run and work can be done to make 
sure "green" is achieved. Once "green" status has been achieved, the draft can 
then be upgraded to a final PR by pressing the "Ready For Review" button. At 
this point all commits on the branch can then once again be squashed into a 
single commit.
Now project committers will now know that the PR is in a state that it 
can be reviewed for completeness and functionality.
In addition, it will help tremendously helpful if anyone submitting a 
PR monitors their PR for activity. If there is no activity for a few days, 
please feel free to ping the Apache Geode Dev community for review. If review 
is request, please prioritize some time to address the feedback, as reviewers 
spend time reviewing code and getting an understanding what the code is doing. 
If too much time goes by, between review and addressing the review comments, 
not only does the reviewer lose context, possibly requiring them to spend time 
again to understand what the code was/is supposed to do, but also possibly lose 
interest, as the ticket has now become cold or dropped down the list of PRs.
There are currently many PRs that are in a cold state, where the time 
between review and response has been so long, that both parties (reviewer and 
submitter) have forgotten about the PR.
In the case that the reviews will take more time to address than 
expected, please downgrade the PR to draft status again. At this point, it does 
not mean that reviewers will not be able to help anymore, as you can request a 
reviewer to help with feedback and comments, until one feels that the PR is 
back in a state of final submission.
So, what I'm really asking from the Dev Community:
If you submit a PR, it would be great if you can nudge the 
community if there is no review on the PR. If feedback is provided on a PR, 
please addr

Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-15 Thread Alberto Gomez
Nice proposal, Udo.

Here come some questions:

Is the ClassLoader isolation RFC implemented? I have not seen any references to 
it in the doc or code. To me this RFC seems like a part of the ClassLoader 
isolation RFC as, without it, the original one would not work completely. Is 
this right?

If I understand correctly, to start with there will be two implementations of 
the ClassLoaderService, the DefaultClassLoaderService (with the current 
ClassLoader functionality) and another one with the modular ClassLoader 
functionality as provided by JBoss modules. The latter one will be used when 
the --experimental-classloader flag is used in GFSH. Is this right?

Thanks,

-Alberto G.

From: Udo Kohlmeyer 
Sent: Monday, September 14, 2020 12:42 PM
To: geode 
Subject: [Discussion] - ClassLoaderService RFC proposal

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://cwiki.apache.org/confluence/display/GEODE/Introduction+of+ClassLoaderService+into+Geode

All comments are please to be made in this mail thread.

—Udo


Review needed for PR "Different behavior in transactions on partitioned regions between creating the region with a parallel gateway sender vs altering the region to add the parallel gateway sender"

2020-08-26 Thread Alberto Gomez
Hi Geode devs,

I'd a appreciate some reviews for PR https://github.com/apache/geode/pull/5476 
related to GEODE-8455 "Different behavior in transactions on partitioned 
regions between creating the region with a parallel gateway sender vs altering 
the region to add the parallel gateway sender".
[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]
GEODE-8455: Fix difference between create region with gw sender and a… by 
albertogpz · Pull Request #5476 · 
apache/geode
…lter region with gw sender Geode behaves differently with respect to 
transactions when creating a partitioned region with a parallel gateway sender 
to when first the partitioned region is created ...
github.com


Thanks in advance,

Alberto


Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-10 Thread Alberto Gomez
Hi Xiaojian,

No problem, I had already extended the deadline for comments to next Thursday 
(July the 16th). If more time is needed to get all the relevant comments, we 
can extend it further.

Thanks,

Alberto

From: Xiaojian Zhou 
Sent: Friday, July 10, 2020 6:32 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Hi, Alberto:

I was the original author who introduced the tmpDroppedEvents. Due to other 
work, I only got chance to read the issue on Thursday, which is your deadline. 
Can you hold on a little bit longer to Monday?

I have been thinking of history of the code changes and issues you encountered. 
I will try to find a light-weight solution with minimum impact to current code.

Regards
Xiaojian Zhou

On 7/8/20, 1:05 PM, "Eric Shu"  wrote:

I think the only case the memory issue occurred is when all gateway senders 
are stopped in the wan-site. Otherwise another member would assume to be the 
primary queue. No more events will be enqueued in tmpDroppedEvents on the 
member with original primary queue. (For parallel wan queue, I do not think 
stop one gateway queue is a valid case to support.)

For all gateway senders are stopped case, no need to notify any other 
members in the wan site if the limit is reached. The tmpDroppedEvents is only 
used for remove events on the secondary queue. If no events are enqueued in the 
secondary queue, there is no need to add into tmpDroppedEvents at all. To me, 
it should be only used for limited events to be queued.

Regards,
Eric

From: Alberto Gomez 
Sent: Wednesday, July 8, 2020 12:02 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Thanks for your comments, Eric.

Limiting the size of the queue would be a simple solution but I think it 
would pose several problems on the the one configuring and operating Geode:

  *   How big should the queue be? Probably not easy to dimension. Should 
the limit by on the memory occupied by the elements or on the number of 
elements in the queue (in which case, depending on the size of the elements, 
the memory used could vary a lot)?
  *   What  to do when the limit has been reached? how do we notify that it 
was reached, what to do afterwards, how would we know what dropped events did 
not make it to the queue but should have been removed from the secondary's 
queue...

I think the solution proposed in the RFC is simple enough and also 
addresses a possible confusion with the semantics of the gateway sender stop 
command.
Stopping a gateway sender currently makes that all events received while 
the sender is stopped are dropped; but at the same time, unlimited memory may 
be consumed by the dropped events. We could put a limit on the amount of memory 
used by the queued dropped events but what would be the point in the first 
place to store them if those events will not be sent to the remote site anyway?
I would expect that after stopping a gateway sender no resources (or at 
least a minimal part) would be consumed by it. Otherwise we may as well not 
stop it or use the pause command depending on what we want to achieve.

From what I have seen, queuing dropped events has its place while the 
gateway sender is starting and while it is stopping but if it is done in a 
sender to be started manually or in a manually stopped server it could provoke 
an unexpected memory exhaustion.

I really think the solution proposed makes the behavior of the gateway 
sender command more logical.

Best regards,

Alberto

From: Eric Shu 
Sent: Wednesday, July 8, 2020 7:32 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

It seems that I was not able to comment on the RFC in the wiki yet.

Just try to find out if we have a simple solution for the issue you raised 
-- can we have a up-limit for the tmpDroppedEvents queue in question?

Always check the limit before adding to the queue -- so that the tmp queue 
is not unbound?

Regards,
Eric

    From: Alberto Gomez 
Sent: Monday, July 6, 2020 8:24 AM
To: geode 
Subject: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Hi,

I have published a new RFC in the Apache Geode wiki with the following 
title: "Avoid the queueing of dropped events by the primary gateway sender when 
the gateway sender is stopped".


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fcon

Re: [Proposal] - RFC etiquette

2020-07-10 Thread Alberto Gomez
Hi Geode Devs,

First of all, Udo, thanks for your proposal. I am all up for what you are 
aiming at: "better round out each RFC. Causing less delays later in the process 
and allowing all community members to actively participate in the review 
process regardless of technical skill level."

Secondly, I think I am to blame for having given two little time to review the 
latest RFC I have published. I apologize for it. I felt the changes were too 
small, assumed that the solution was not problematic and as a result gave less 
than a week to review which I now think is too little even if the RFC content 
was small. This has probably triggered Udo's proposal so, in a way, it has not 
been such a bad thing 😉.

Regarding the concrete proposal to achieve the goal, I think the 2 week minimum 
period is very reasonable. The new use case section may help to have more 
community members actively participating but I am not sure that it will be the 
definitive measure. I feel that sometimes the lack of participation comes from 
lack of time because we're busy with other things and not so much with how the 
RFC proposal has been written. Anyhow, having an example of what this new 
section should look like would be helpful for new RFCs to be written.

Alberto


From: Udo Kohlmeyer 
Sent: Thursday, July 9, 2020 10:18 PM
To: geode 
Subject: [Proposal] - RFC etiquette

Hi there Geode Dev's

I would like to propose the following changes to the RFC process that we have 
in place at the moment.

  1.  All submitted RFC’s will provide a minimum 2 week review period. This is 
to allow the community to review the RFC in a reasonable timeframe. If we rush 
things, we will miss things. I’d rather have a little more time spent on the 
RFC review and getting the approach “correct” than rushing the RFC and then at 
a later point in time (either at PR review or worse production issue) find out 
that the approach was less than optimal.
  2.  Add a new section to the RFC. I would like to propose this section to be 
labelled “Use Cases”. In this section I would like all submitters to describe 
the use case that this RFC is to fulfill. This would include all possible 
combinations (success and failure) and expected outcomes of each.

I hope with the additions to the RFC process and template we can better round 
out each RFC. Causing less delays later in the process and allowing all 
community members to actively participate in the review process regardless of 
technical skill level.

Thoughts or comments?

—Udo


Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-09 Thread Alberto Gomez
Hi Alexander,

Yes, sure. I am extending the deadline for comments to next Thursday, July the 
16th.

Cheers,

Alberto G.

From: Alexander Murmann 
Sent: Thursday, July 9, 2020 1:42 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Hi Alberto,

The timing on this RFC feels really tight. Would you be open to extending
this to next week?

On Wed, Jul 8, 2020 at 1:04 PM Eric Shu  wrote:

> I think the only case the memory issue occurred is when all gateway
> senders are stopped in the wan-site. Otherwise another member would assume
> to be the primary queue. No more events will be enqueued in
> tmpDroppedEvents on the member with original primary queue. (For parallel
> wan queue, I do not think stop one gateway queue is a valid case to
> support.)
>
> For all gateway senders are stopped case, no need to notify any other
> members in the wan site if the limit is reached. The tmpDroppedEvents is
> only used for remove events on the secondary queue. If no events are
> enqueued in the secondary queue, there is no need to add into
> tmpDroppedEvents at all. To me, it should be only used for limited events
> to be queued.
>
> Regards,
> Eric
> ____
> From: Alberto Gomez 
> Sent: Wednesday, July 8, 2020 12:02 PM
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the
> primary gateway sender when the gateway sender is stopped
>
> Thanks for your comments, Eric.
>
> Limiting the size of the queue would be a simple solution but I think it
> would pose several problems on the the one configuring and operating Geode:
>
>   *   How big should the queue be? Probably not easy to dimension. Should
> the limit by on the memory occupied by the elements or on the number of
> elements in the queue (in which case, depending on the size of the
> elements, the memory used could vary a lot)?
>   *   What  to do when the limit has been reached? how do we notify that
> it was reached, what to do afterwards, how would we know what dropped
> events did not make it to the queue but should have been removed from the
> secondary's queue...
>
> I think the solution proposed in the RFC is simple enough and also
> addresses a possible confusion with the semantics of the gateway sender
> stop command.
> Stopping a gateway sender currently makes that all events received while
> the sender is stopped are dropped; but at the same time, unlimited memory
> may be consumed by the dropped events. We could put a limit on the amount
> of memory used by the queued dropped events but what would be the point in
> the first place to store them if those events will not be sent to the
> remote site anyway?
> I would expect that after stopping a gateway sender no resources (or at
> least a minimal part) would be consumed by it. Otherwise we may as well not
> stop it or use the pause command depending on what we want to achieve.
>
> From what I have seen, queuing dropped events has its place while the
> gateway sender is starting and while it is stopping but if it is done in a
> sender to be started manually or in a manually stopped server it could
> provoke an unexpected memory exhaustion.
>
> I really think the solution proposed makes the behavior of the gateway
> sender command more logical.
>
> Best regards,
>
> Alberto
> 
> From: Eric Shu 
> Sent: Wednesday, July 8, 2020 7:32 PM
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the
> primary gateway sender when the gateway sender is stopped
>
> It seems that I was not able to comment on the RFC in the wiki yet.
>
> Just try to find out if we have a simple solution for the issue you raised
> -- can we have a up-limit for the tmpDroppedEvents queue in question?
>
> Always check the limit before adding to the queue -- so that the tmp queue
> is not unbound?
>
> Regards,
> Eric
> 
> From: Alberto Gomez 
> Sent: Monday, July 6, 2020 8:24 AM
> To: geode 
> Subject: [DISCUSS] RFC - Avoid the queueing of dropped events by the
> primary gateway sender when the gateway sender is stopped
>
> Hi,
>
> I have published a new RFC in the Apache Geode wiki with the following
> title: "Avoid the queueing of dropped events by the primary gateway sender
> when the gateway sender is stopped".
>
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAvoid%2Bthe%2Bqueuing%2Bof%2Bdropped%2Bevents%2Bby%2Bthe%2Bprimary%2Bgateway%2Bsender%2Bwhen%2Bth

Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-09 Thread Alberto Gomez
Hi Eric,

I agree that the only case in which the memory issue may occur is when all 
gateway senders instances are stopped. And that is what the solution proposed 
in the RFC is targeted at, and also that is why the stop gateway sender command 
is intended to be updated to fix the issue.

Note that while stopping all the gateway sender instances, there may be events 
stored in the secondary senders that will be dropped by the primary sender. 
Those dropped events need to be queued while the secondaries are still up so 
that when the sender is started again, the secondary's queues would be drained 
accordingly.
If we go for the option of setting a limit on the dropped events, if set too 
small, there could be dropped events that should have been queued but weren't 
due to having reached the limit and which would not be sent to the secondaries 
to drain their queues completely (this is the case in which I meant that a 
notification must be sent to the operator of the system so that he knows that a 
possible issue is present in the system: queues with events that would stay 
there forever). On the other hand, if the limit is too high, the memory 
consumed by the queued dropped events could cause a problem of memory 
exhaustion.

I think the right balance is to stop queueing dropped events when all the 
gateway sender instances are stopped.

BR,

Alberto


From: Eric Shu 
Sent: Wednesday, July 8, 2020 9:25 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

I think the only case the memory issue occurred is when all gateway senders are 
stopped in the wan-site. Otherwise another member would assume to be the 
primary queue. No more events will be enqueued in tmpDroppedEvents on the 
member with original primary queue. (For parallel wan queue, I do not think 
stop one gateway queue is a valid case to support.)


For all gateway senders are stopped case, no need to notify any other members 
in the wan site if the limit is reached. The tmpDroppedEvents is only used for 
remove events on the secondary queue. If no events are enqueued in the 
secondary queue, there is no need to add into tmpDroppedEvents at all. To me, 
it should be only used for limited events to be queued.

Regards,
Eric
____
From: Alberto Gomez 
Sent: Wednesday, July 8, 2020 12:02 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Thanks for your comments, Eric.

Limiting the size of the queue would be a simple solution but I think it would 
pose several problems on the the one configuring and operating Geode:

  *   How big should the queue be? Probably not easy to dimension. Should the 
limit by on the memory occupied by the elements or on the number of elements in 
the queue (in which case, depending on the size of the elements, the memory 
used could vary a lot)?
  *   What  to do when the limit has been reached? how do we notify that it was 
reached, what to do afterwards, how would we know what dropped events did not 
make it to the queue but should have been removed from the secondary's queue...

I think the solution proposed in the RFC is simple enough and also addresses a 
possible confusion with the semantics of the gateway sender stop command.
Stopping a gateway sender currently makes that all events received while the 
sender is stopped are dropped; but at the same time, unlimited memory may be 
consumed by the dropped events. We could put a limit on the amount of memory 
used by the queued dropped events but what would be the point in the first 
place to store them if those events will not be sent to the remote site anyway?
I would expect that after stopping a gateway sender no resources (or at least a 
minimal part) would be consumed by it. Otherwise we may as well not stop it or 
use the pause command depending on what we want to achieve.

>From what I have seen, queuing dropped events has its place while the gateway 
>sender is starting and while it is stopping but if it is done in a sender to 
>be started manually or in a manually stopped server it could provoke an 
>unexpected memory exhaustion.

I really think the solution proposed makes the behavior of the gateway sender 
command more logical.

Best regards,

Alberto

From: Eric Shu 
Sent: Wednesday, July 8, 2020 7:32 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

It seems that I was not able to comment on the RFC in the wiki yet.

Just try to find out if we have a simple solution for the issue you raised -- 
can we have a up-limit for the tmpDroppedEvents queue in question?

Always check the limit before adding to the queue 

Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-08 Thread Alberto Gomez
Thanks for your comments, Eric.

Limiting the size of the queue would be a simple solution but I think it would 
pose several problems on the the one configuring and operating Geode:

  *   How big should the queue be? Probably not easy to dimension. Should the 
limit by on the memory occupied by the elements or on the number of elements in 
the queue (in which case, depending on the size of the elements, the memory 
used could vary a lot)?
  *   What  to do when the limit has been reached? how do we notify that it was 
reached, what to do afterwards, how would we know what dropped events did not 
make it to the queue but should have been removed from the secondary's queue...

I think the solution proposed in the RFC is simple enough and also addresses a 
possible confusion with the semantics of the gateway sender stop command.
Stopping a gateway sender currently makes that all events received while the 
sender is stopped are dropped; but at the same time, unlimited memory may be 
consumed by the dropped events. We could put a limit on the amount of memory 
used by the queued dropped events but what would be the point in the first 
place to store them if those events will not be sent to the remote site anyway?
I would expect that after stopping a gateway sender no resources (or at least a 
minimal part) would be consumed by it. Otherwise we may as well not stop it or 
use the pause command depending on what we want to achieve.

>From what I have seen, queuing dropped events has its place while the gateway 
>sender is starting and while it is stopping but if it is done in a sender to 
>be started manually or in a manually stopped server it could provoke an 
>unexpected memory exhaustion.

I really think the solution proposed makes the behavior of the gateway sender 
command more logical.

Best regards,

Alberto

From: Eric Shu 
Sent: Wednesday, July 8, 2020 7:32 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

It seems that I was not able to comment on the RFC in the wiki yet.

Just try to find out if we have a simple solution for the issue you raised -- 
can we have a up-limit for the tmpDroppedEvents queue in question?

Always check the limit before adding to the queue -- so that the tmp queue is 
not unbound?

Regards,
Eric
________
From: Alberto Gomez 
Sent: Monday, July 6, 2020 8:24 AM
To: geode 
Subject: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary 
gateway sender when the gateway sender is stopped

Hi,

I have published a new RFC in the Apache Geode wiki with the following title: 
"Avoid the queueing of dropped events by the primary gateway sender when the 
gateway sender is stopped".

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAvoid%2Bthe%2Bqueuing%2Bof%2Bdropped%2Bevents%2Bby%2Bthe%2Bprimary%2Bgateway%2Bsender%2Bwhen%2Bthe%2Bgateway%2Bsender%2Bis%2Bstopped&data=02%7C01%7Ceshu%40vmware.com%7Cf4d61d141c014854f4c508d821c0a78e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637296458615861191&sdata=Nqd%2FeUxXR713XIzn5KRg4x2V6CJIGHSgTEEwlTEzryk%3D&reserved=0

Could you please give comments by Thursday, July 9th, 2020?

Thanks in advance,

Alberto G.


[DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-06 Thread Alberto Gomez
Hi,

I have published a new RFC in the Apache Geode wiki with the following title: 
"Avoid the queueing of dropped events by the primary gateway sender when the 
gateway sender is stopped".

https://cwiki.apache.org/confluence/display/GEODE/Avoid+the+queuing+of+dropped+events+by+the+primary+gateway+sender+when+the+gateway+sender+is+stopped

Could you please give comments by Thursday, July 9th, 2020?

Thanks in advance,

Alberto G.


Re: Question about gateway sender stopped and memory consumption

2020-07-02 Thread Alberto Gomez
Thanks Juan!

I will check it.

Alberto

From: Ju@N 
Sent: Thursday, July 2, 2020 7:46 PM
To: dev@geode.apache.org 
Subject: Re: Question about gateway sender stopped and memory consumption

I recall some discussion about this in the past, there even was an "RFC"
that never got implemented:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80452478.
Best regards.

On Thu, 2 Jul 2020 at 18:41, Kirk Lund  wrote:

> I would have expected unsent events to be stored in a queue that is backed
> by a persistent region or something on disk. If that's not currently true,
> then it seems like a good direction might be to make tmpDroppedEvents use a
> durable queue of some sort that overflows to disk.
>
>
>
> On Thu, Jul 2, 2020 at 10:33 AM Alberto Gomez 
> wrote:
>
> > Hi,
> >
> > We have observed that when a gateway sender is stopped in a site, all the
> > events received while it is stopped are stored in the
> > 'AbstractGatewaySender.tmpDroppedEvents' queue of the primary sender. The
> > elements of this queue are not removed from this queue until the sender
> is
> > started back again.
> >
> > This behavior implies that if the gateway sender is stopped for a long
> > time, there is a risk of heap exhaustion in the members hosting primary
> > senders.
> >
> > Under split brain situations, if lasting long enough, there could be heap
> > exhaustion problems in servers due to the memory used by the gateway
> sender
> > queues, even if overflow to disk is used -given that part of the event is
> > always stored in memory.
> > For those situations we had thought about stopping gateway senders when
> > the memory used by the gateway sender queues reached a certain memory
> > threshold. But according to the above, stopping the gateway senders would
> > only make things worse.
> >
> > Would it make sense for the gateway sender not to store the received
> > events in tmpDroppedEvents while it is stopped?
> >
> > Any suggestion on how to approach the problem of heap exhaustion due to
> > the growth of gateway sender queues in long lasting split brain
> situations?
> >
> > Thanks in advance,
> >
> > Alberto G.
> >
> >
> >
>


--
Ju@N


Re: Question about gateway sender stopped and memory consumption

2020-07-02 Thread Alberto Gomez
Thanks for your answer, Kirk.

If we persist the unsent events in a persistent region then the memory consumed 
would not be as high but still it would not solve our problem with long lasting 
split brain as the persistent region would take some memory too to store those 
events even if they were overflown.

Ideally it should be backed up in a queue that does not use any memory.

Best regards,

Alberto


From: Kirk Lund 
Sent: Thursday, July 2, 2020 7:41 PM
To: dev@geode.apache.org 
Subject: Re: Question about gateway sender stopped and memory consumption

I would have expected unsent events to be stored in a queue that is backed
by a persistent region or something on disk. If that's not currently true,
then it seems like a good direction might be to make tmpDroppedEvents use a
durable queue of some sort that overflows to disk.



On Thu, Jul 2, 2020 at 10:33 AM Alberto Gomez 
wrote:

> Hi,
>
> We have observed that when a gateway sender is stopped in a site, all the
> events received while it is stopped are stored in the
> 'AbstractGatewaySender.tmpDroppedEvents' queue of the primary sender. The
> elements of this queue are not removed from this queue until the sender is
> started back again.
>
> This behavior implies that if the gateway sender is stopped for a long
> time, there is a risk of heap exhaustion in the members hosting primary
> senders.
>
> Under split brain situations, if lasting long enough, there could be heap
> exhaustion problems in servers due to the memory used by the gateway sender
> queues, even if overflow to disk is used -given that part of the event is
> always stored in memory.
> For those situations we had thought about stopping gateway senders when
> the memory used by the gateway sender queues reached a certain memory
> threshold. But according to the above, stopping the gateway senders would
> only make things worse.
>
> Would it make sense for the gateway sender not to store the received
> events in tmpDroppedEvents while it is stopped?
>
> Any suggestion on how to approach the problem of heap exhaustion due to
> the growth of gateway sender queues in long lasting split brain situations?
>
> Thanks in advance,
>
> Alberto G.
>
>
>


Question about gateway sender stopped and memory consumption

2020-07-02 Thread Alberto Gomez
Hi,

We have observed that when a gateway sender is stopped in a site, all the 
events received while it is stopped are stored in the 
'AbstractGatewaySender.tmpDroppedEvents' queue of the primary sender. The 
elements of this queue are not removed from this queue until the sender is 
started back again.

This behavior implies that if the gateway sender is stopped for a long time, 
there is a risk of heap exhaustion in the members hosting primary senders.

Under split brain situations, if lasting long enough, there could be heap 
exhaustion problems in servers due to the memory used by the gateway sender 
queues, even if overflow to disk is used -given that part of the event is 
always stored in memory.
For those situations we had thought about stopping gateway senders when the 
memory used by the gateway sender queues reached a certain memory threshold. 
But according to the above, stopping the gateway senders would only make things 
worse.

Would it make sense for the gateway sender not to store the received events in 
tmpDroppedEvents while it is stopped?

Any suggestion on how to approach the problem of heap exhaustion due to the 
growth of gateway sender queues in long lasting split brain situations?

Thanks in advance,

Alberto G.




Re: Fate of master branch

2020-06-26 Thread Alberto Gomez
I agree also on removing the master branch.

As a relatively new member of the community it's been a source of confusion to 
me when looking at what is said in the wiki about it 
(https://cwiki.apache.org/confluence/display/GEODE/Versioning+and+Branching) 
and comparing it with the actual practice.

Alberto G.

From: Jacob Barrett 
Sent: Friday, June 26, 2020 5:26 PM
To: dev@geode.apache.org 
Subject: Re: Fate of master branch

I am 100% in favor or dropping the master branch completely. I felt like it was 
always a source of confusion. Was it the most recent release or the latest 
version number. I know we have had issues with even correctly merging the 
latest version back into it sometimes.

I really can’t see any reason for keeping it around.

-Jake



> On Jun 26, 2020, at 8:05 AM, Blake Bender  wrote:
>
> Apologies if this has been addressed already and I missed it.  In keeping 
> with other OSS projects, I believe it’s time we did something about removing 
> the insensitive term master from Geode repositories.
>
> One choice a lot of projects appear to be going with is a simple rename from 
> master • main.  In our own case, however, master isn’t really in use for 
> anything vital.  We track releases with a tag and a branch to backport fixes 
> to, and the develop branch is the “source of truth” latest-and-greatest 
> version of the code.  We could thus simply delete master with no loss I’m 
> aware of.  Any opinions?
>
> Thanks,
>
> Blake
>



Re: Successful build on windows

2020-06-25 Thread Alberto Gomez
Hi Kirk,

I build on Ubuntu 18.02 and I occasionally see the partial stack traces you 
mentioned on geode-wan:tests you mentioned. So it is not just a Windows thing.

Never figured out what they provoked them and neither how to get them 
consistently.

BR,

Alberto


From: Kirk Lund 
Sent: Thursday, June 25, 2020 11:53 PM
To: dev@geode.apache.org 
Subject: Successful build on windows

In case anyone is interested in the developer experience building with unit
tests on windows:

It succeeds (after a couple tries) but something in geode-wan:test spits
out a partial stack trace. Since all the tests passed, I don't really see a
way to find out which test generated it.

C:\Users\kirkl\dev\geode>gradlew.bat build



*> Task :geode-wan:testat
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.checkCancelled(ParallelGatewaySenderQueue.java:1780)
  at
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1879)*

> Task :combineReports
All test reports at C:\Users\kirkl\dev\geode\build/reports/combined

Deprecated Gradle features were used in this build, making it incompatible
with Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See
https://docs.gradle.org/5.4/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 3m 52s
532 actionable tasks: 97 executed, 435 up-to-date


Re: Reviewers for GEODE-8231: C++ native client keeps trying to connect to down cache server hosting a partitioned region

2020-06-22 Thread Alberto Gomez
Hi,

I have no complete reviews yet. Any volunteers?

Thanks,

Alberto

From: Alberto Gomez 
Sent: Monday, June 15, 2020 1:31 PM
To: dev@geode.apache.org 
Subject: Reviewers for GEODE-8231: C++ native client keeps trying to connect to 
down cache server hosting a partitioned region

Hi,

Can someone please review my PR about 
https://issues.apache.org/jira/browse/GEODE-8231 (C++ native client keeps 
trying to connect to down cache server hosting a partitioned region)?

Here is the link to the PR: https://github.com/apache/geode-native/pull/615

Thanks,

/Alberto G.


Heap memory used by gateway sender queues way above configured limit after server restart

2020-06-18 Thread Alberto Gomez
Hi,

I have found an issue with heap memory consumed by gateway sender queues way 
above the configured limit after a server is restarted (on the restarted 
server).

The problem is described in the following ticket:

https://issues.apache.org/jira/browse/GEODE-8278

I would highly appreciate some help from the community on where to look in the 
code (or any other hint) in order to implement a solution.

Thanks in advance,

Alberto G.


Reviewers for GEODE-8231: C++ native client keeps trying to connect to down cache server hosting a partitioned region

2020-06-15 Thread Alberto Gomez
Hi,

Can someone please review my PR about 
https://issues.apache.org/jira/browse/GEODE-8231 (C++ native client keeps 
trying to connect to down cache server hosting a partitioned region)?

Here is the link to the PR: https://github.com/apache/geode-native/pull/615

Thanks,

/Alberto G.


Re: About Geode rolling downgrade

2020-06-12 Thread Alberto Gomez
Hi Naba!

Did you manage to comment this topic with some engineers?

Cheers,

/Alberto G.

From: Nabarun Nag 
Sent: Friday, June 5, 2020 11:00 AM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade


Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of 
days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the 
downgrade. Like create a DS with old versions and let it do a GII from the 
newer version cluster and then shutdown the new version DS. Now we have a DS 
with lower version.


Regards
Naba


From: Mario Ivanac 
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode 
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 14. svibnja 2020. 14:45
Prima: geode 
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clust

Re: Problem in rolling upgrade since 1.12

2020-06-11 Thread Alberto Gomez
Thanks for the info, Bill.

I have found another issue in rolling upgrade since 1.12.

I have observed that when there are custom jars previously deployed, the 
locator is not able be started in the new version and the following exception 
is thrown:

Exception in thread "main" org.apache.geode.SerializationException: Could not 
create an instance of 
org.apache.geode.management.internal.configuration.domain.Configuration .

I have pushed another commit in the draft I sent before containing the new test 
case.

/Alberto G.

From: Bill Burcham 
Sent: Thursday, June 11, 2020 1:53 AM
To: dev@geode.apache.org 
Subject: Re: Problem in rolling upgrade since 1.12

Ernie made us a ticket for this issue:
https://issues.apache.org/jira/browse/GEODE-8240

On Mon, Jun 8, 2020 at 12:59 PM Alberto Gomez 
wrote:

> Hi Ernie,
>
> I have seen this problem in the support/1.13 branch and also on develop.
>
> Interestingly, the patch I sent is applied seamlessly in my local repo set
> to the develop branch.
>
> The patch modifies the
> RollingUpgradeRollServersOnPartitionedRegion_dataserializable test case by
> running "list members" on an upgraded system is
> RollingUpgradeRollServersOnPartitionedRegion_dataserializable. I run it
> manually with the following command:
>
> ./gradlew geode-core:upgradeTest
> --tests=RollingUpgradeRollServersOnPartitionedRegion_dataserializable
>
> I see it failing when upgrading from 1.12.
>
> I created a draft PR where you can see also the changes in the test case
> that manifest the problem.
>
> See: https://github.com/apache/geode/pull/5224
>
>
> Please, let me know if you need any more information.
>
> BR,
>
> Alberto
> 
> From: Ernie Burghardt 
> Sent: Monday, June 8, 2020 9:04 PM
> To: dev@geode.apache.org 
> Subject: Re: Problem in rolling upgrade since 1.12
>
> Hi Alberto,
>
> I’m looking at this, came across a couple blockers…
> Do you have branch that exhibits this problem? Draft PR maybe?
> I tried to apply you patch to latest develop, but the patch doesn’t pass
> git apply’s check….
> Also these tests pass on develop, would you be able to check against the
> latest and update the diff?
> I’m very interested in reproducing the issue you have observed.
>
> Thanks,
> Ernie
>
> From: Alberto Gomez 
> Reply-To: "dev@geode.apache.org" 
> Date: Monday, June 8, 2020 at 12:31 AM
> To: "dev@geode.apache.org" 
> Subject: Re: Problem in rolling upgrade since 1.12
>
> Hi,
>
> I attach a diff for the modified test case in case you would like to use
> it to check the problem I mentioned.
>
> BR,
>
> Alberto
> 
> From: Alberto Gomez 
> Sent: Saturday, June 6, 2020 4:06 PM
> To: dev@geode.apache.org 
> Subject: Problem in rolling upgrade since 1.12
>
> Hi,
>
> I have observed that since version 1.12 rolling upgrades to future
> versions leave the first upgraded locator "as if" it was still on version
> 1.12.
>
> This is the output from "list members" before starting the upgrade from
> version 1.12:
>
> Name | Id
>  | ---
> vm2  | 192.168.0.37(vm2:29367:locator):41001 [Coordinator]
> vm0  | 192.168.0.37(vm0:29260):41002
> vm1  | 192.168.0.37(vm1:29296):41003
>
>
> And this is the output from "list members" after upgrading the first
> locator from 1.12 to 1.13/1.14:
>
> Name | Id
>  |
> 
> vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0)
> [Coordinator]
> vm0  | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0)
> vm1  | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0)
>
>
> Finally this is the output in gfsh once the rolling upgrade has been
> completed (locators and servers upgraded):
>
> Name | Id
>  |
> 
> vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0)
> [Coordinator]
> vm0  | 192.168.0.37(vm0:2457):41002
> vm1  | 192.168.0.37(vm1:2576):41003
>
> I verified this by running manual tests and also by running the following
> upgrade test (had to stop it in the middle to connect via gfsh and get the
> gfsh outputs):
>
> RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable
>
> After the rolling upgrade, the shutdown command fails with the following
> error:
> Member 192.168.0.37(vm2:1453:locator):41001 could not be found.
> Please verify the member name or ID and try again.
>
> The only way I have found to come out of the situation is by restarting
> the locator.
> Once restarted again, the output of gfsh shows that all members are
> upgraded to the new version, i.e. the locator does not show anymore that it
> is on version GEODE 1.12.0.
>
> Anybody has any clue why this is happening?
>
> Thanks in advance,
>
> /Alberto G.
>


Re: Problem in rolling upgrade since 1.12

2020-06-08 Thread Alberto Gomez
Hi Ernie,

I have seen this problem in the support/1.13 branch and also on develop.

Interestingly, the patch I sent is applied seamlessly in my local repo set to 
the develop branch.

The patch modifies the 
RollingUpgradeRollServersOnPartitionedRegion_dataserializable test case by 
running "list members" on an upgraded system is 
RollingUpgradeRollServersOnPartitionedRegion_dataserializable. I run it 
manually with the following command:

./gradlew geode-core:upgradeTest 
--tests=RollingUpgradeRollServersOnPartitionedRegion_dataserializable

I see it failing when upgrading from 1.12.

I created a draft PR where you can see also the changes in the test case that 
manifest the problem.

See: https://github.com/apache/geode/pull/5224


Please, let me know if you need any more information.

BR,

Alberto

From: Ernie Burghardt 
Sent: Monday, June 8, 2020 9:04 PM
To: dev@geode.apache.org 
Subject: Re: Problem in rolling upgrade since 1.12

Hi Alberto,

I’m looking at this, came across a couple blockers…
Do you have branch that exhibits this problem? Draft PR maybe?
I tried to apply you patch to latest develop, but the patch doesn’t pass git 
apply’s check….
Also these tests pass on develop, would you be able to check against the latest 
and update the diff?
I’m very interested in reproducing the issue you have observed.

Thanks,
Ernie

From: Alberto Gomez 
Reply-To: "dev@geode.apache.org" 
Date: Monday, June 8, 2020 at 12:31 AM
To: "dev@geode.apache.org" 
Subject: Re: Problem in rolling upgrade since 1.12

Hi,

I attach a diff for the modified test case in case you would like to use it to 
check the problem I mentioned.

BR,

Alberto
________
From: Alberto Gomez 
Sent: Saturday, June 6, 2020 4:06 PM
To: dev@geode.apache.org 
Subject: Problem in rolling upgrade since 1.12

Hi,

I have observed that since version 1.12 rolling upgrades to future versions 
leave the first upgraded locator "as if" it was still on version 1.12.

This is the output from "list members" before starting the upgrade from version 
1.12:

Name | Id
 | ---
vm2  | 192.168.0.37(vm2:29367:locator):41001 [Coordinator]
vm0  | 192.168.0.37(vm0:29260):41002
vm1  | 192.168.0.37(vm1:29296):41003


And this is the output from "list members" after upgrading the first locator 
from 1.12 to 1.13/1.14:

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0)
vm1  | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0)


Finally this is the output in gfsh once the rolling upgrade has been completed 
(locators and servers upgraded):

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:2457):41002
vm1  | 192.168.0.37(vm1:2576):41003

I verified this by running manual tests and also by running the following 
upgrade test (had to stop it in the middle to connect via gfsh and get the gfsh 
outputs):
RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable

After the rolling upgrade, the shutdown command fails with the following error:
Member 192.168.0.37(vm2:1453:locator):41001 could not be found.  Please 
verify the member name or ID and try again.

The only way I have found to come out of the situation is by restarting the 
locator.
Once restarted again, the output of gfsh shows that all members are upgraded to 
the new version, i.e. the locator does not show anymore that it is on version 
GEODE 1.12.0.

Anybody has any clue why this is happening?

Thanks in advance,

/Alberto G.


Re: Problem in rolling upgrade since 1.12

2020-06-08 Thread Alberto Gomez
Hi,

I attach a diff for the modified test case in case you would like to use it to 
check the problem I mentioned.

BR,

Alberto

From: Alberto Gomez 
Sent: Saturday, June 6, 2020 4:06 PM
To: dev@geode.apache.org 
Subject: Problem in rolling upgrade since 1.12

Hi,

I have observed that since version 1.12 rolling upgrades to future versions 
leave the first upgraded locator "as if" it was still on version 1.12.

This is the output from "list members" before starting the upgrade from version 
1.12:

Name | Id
 | ---
vm2  | 192.168.0.37(vm2:29367:locator):41001 [Coordinator]
vm0  | 192.168.0.37(vm0:29260):41002
vm1  | 192.168.0.37(vm1:29296):41003


And this is the output from "list members" after upgrading the first locator 
from 1.12 to 1.13/1.14:

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0)
vm1  | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0)


Finally this is the output in gfsh once the rolling upgrade has been completed 
(locators and servers upgraded):

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:2457):41002
vm1  | 192.168.0.37(vm1:2576):41003

I verified this by running manual tests and also by running the following 
upgrade test (had to stop it in the middle to connect via gfsh and get the gfsh 
outputs):
RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable

After the rolling upgrade, the shutdown command fails with the following error:
Member 192.168.0.37(vm2:1453:locator):41001 could not be found.  Please 
verify the member name or ID and try again.

The only way I have found to come out of the situation is by restarting the 
locator.
Once restarted again, the output of gfsh shows that all members are upgraded to 
the new version, i.e. the locator does not show anymore that it is on version 
GEODE 1.12.0.

Anybody has any clue why this is happening?

Thanks in advance,

/Alberto G.
diff --git a/geode-core/src/upgradeTest/java/org/apache/geode/internal/cache/rollingupgrade/RollingUpgradeDUnitTest.java b/geode-core/src/upgradeTest/java/org/apache/geode/internal/cache/rollingupgrade/RollingUpgradeDUnitTest.java
index 089b4ffd9d..a501d89a9f 100644
--- a/geode-core/src/upgradeTest/java/org/apache/geode/internal/cache/rollingupgrade/RollingUpgradeDUnitTest.java
+++ b/geode-core/src/upgradeTest/java/org/apache/geode/internal/cache/rollingupgrade/RollingUpgradeDUnitTest.java
@@ -14,6 +14,10 @@
  */
 package org.apache.geode.internal.cache.rollingupgrade;
 
+import static org.apache.geode.distributed.ConfigurationProperties.JMX_MANAGER;
+import static org.apache.geode.distributed.ConfigurationProperties.JMX_MANAGER_PORT;
+import static org.apache.geode.distributed.ConfigurationProperties.JMX_MANAGER_START;
+import static org.apache.geode.internal.AvailablePortHelper.getRandomAvailableTCPPorts;
 import static org.apache.geode.test.awaitility.GeodeAwaitility.await;
 import static org.junit.Assert.assertTrue;
 
@@ -27,6 +31,7 @@ import java.util.List;
 import java.util.Properties;
 
 import org.apache.commons.io.FileUtils;
+import org.junit.Rule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameter;
@@ -50,7 +55,6 @@ import org.apache.geode.distributed.internal.DistributionConfig;
 import org.apache.geode.distributed.internal.InternalLocator;
 import org.apache.geode.distributed.internal.membership.InternalDistributedMember;
 import org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave;
-import org.apache.geode.internal.AvailablePortHelper;
 import org.apache.geode.internal.serialization.Version;
 import org.apache.geode.test.dunit.DistributedTestUtils;
 import org.apache.geode.test.dunit.Host;
@@ -59,6 +63,7 @@ import org.apache.geode.test.dunit.Invoke;
 import org.apache.geode.test.dunit.NetworkUtils;
 import org.apache.geode.test.dunit.VM;
 import org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase;
+import org.apache.geode.test.junit.rules.GfshCommandRule;
 import org.apache.geode.test.junit.runners.CategoryWithParameterizedRunnerFactory;
 import org.apache.geode.test.version.VersionManager;
 
@@ -78,10 +83,14 @@ import org.apache.geode.test.version.VersionManager;
  * @author jhuynh
  */
 
+
 @RunWith(Parameterized.class)
 @UseParametersRunnerFactory(CategoryWithParameterizedRunnerFactory.class)
 public abstract class RollingUpgradeDUnitTest extends JUnit4DistributedTestCase {
 
+  @Rule
+  public transient GfshCommandRule gfsh = new GfshCommandRule();
+
   @Parameters

Problem in rolling upgrade since 1.12

2020-06-06 Thread Alberto Gomez
Hi,

I have observed that since version 1.12 rolling upgrades to future versions 
leave the first upgraded locator "as if" it was still on version 1.12.

This is the output from "list members" before starting the upgrade from version 
1.12:

Name | Id
 | ---
vm2  | 192.168.0.37(vm2:29367:locator):41001 [Coordinator]
vm0  | 192.168.0.37(vm0:29260):41002
vm1  | 192.168.0.37(vm1:29296):41003


And this is the output from "list members" after upgrading the first locator 
from 1.12 to 1.13/1.14:

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0)
vm1  | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0)


Finally this is the output in gfsh once the rolling upgrade has been completed 
(locators and servers upgraded):

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:2457):41002
vm1  | 192.168.0.37(vm1:2576):41003

I verified this by running manual tests and also by running the following 
upgrade test (had to stop it in the middle to connect via gfsh and get the gfsh 
outputs):
RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable

After the rolling upgrade, the shutdown command fails with the following error:
Member 192.168.0.37(vm2:1453:locator):41001 could not be found.  Please 
verify the member name or ID and try again.

The only way I have found to come out of the situation is by restarting the 
locator.
Once restarted again, the output of gfsh shows that all members are upgraded to 
the new version, i.e. the locator does not show anymore that it is on version 
GEODE 1.12.0.

Anybody has any clue why this is happening?

Thanks in advance,

/Alberto G.


Re: CI concourse checks on a PR not triggered

2020-05-25 Thread Alberto Gomez
Thanks a lot!

It is now working.

-Alberto

From: Owen Nichols 
Sent: Monday, May 25, 2020 9:03 AM
To: dev@geode.apache.org 
Subject: Re: CI concourse checks on a PR not triggered

Looks like a merge conflict: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/Build/builds/7594

Unfortunately the PR pipeline is unable to post statuses to the PR when there 
is a conflict.

Please rebase your PR branch to latest develop to clear this up.

On 5/24/20, 11:39 PM, "Alberto Gomez"  wrote:

Hi,

Since last Friday, the concourse checks for the following PR are not being 
triggered:


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F4928&data=02%7C01%7Conichols%40vmware.com%7C6fa72052040c4c3e0a6108d800765143%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637259855513990153&sdata=uLV8tXuoNjKmrot8nS6bAxRpQV4HmEuoWDxsWo2U9Ts%3D&reserved=0

I have tried to launch them by pushing empty commits but have not been 
successful

Could anybody give me a hand?

Thanks in advance,

-Alberto G.





CI concourse checks on a PR not triggered

2020-05-24 Thread Alberto Gomez
Hi,

Since last Friday, the concourse checks for the following PR are not being 
triggered:

https://github.com/apache/geode/pull/4928

I have tried to launch them by pushing empty commits but have not been 
successful

Could anybody give me a hand?

Thanks in advance,

-Alberto G.




  1   2   >