Re: [VOTE] Move geode to the attic

2022-10-30 Thread Xiaojian Zhou
Vote to move to attic.

Regards
Xiaojian Zhou, PMC of geode

From: Dan Smith 
Date: Friday, October 28, 2022 at 3:15 PM
To: u...@geode.apache.org , dev@geode.apache.org 

Subject: Re: [VOTE] Move geode to the attic
There is still an ongoing discussion on the private@geode list, let's go ahead 
and extend the VOTE deadline until at least Wednesday, November 2.

As Anthony mentioned, only PMC member votes actually count. But all community 
members should feel free to cast their vote and help us reach a consensus. At 
the moment, I count zero PMC votes for or against this proposal.

Thanks,
-Dan

From: rup...@webstersystems.co.uk 
Sent: Tuesday, October 25, 2022 12:30 PM
To: u...@geode.apache.org ; dev@geode.apache.org 

Subject: RE: [VOTE] Move geode to the attic

‼ External Email

Hello, I vote to keep an open source version.
Volunteer for PMC.
Please ping me.
Cheers.

-Original Message-
From: Anthony Baker 
Sent: 25 October 2022 18:38
To: dev@geode.apache.org
Cc: u...@geode.apache.org
Subject: Re: [VOTE] Move geode to the attic

I added the user@ list for visibility.

I believe the purpose of taking a VOTE here is to look for community consensus 
on this idea and perhaps find new people interested in carrying the project 
forward. The votes of the existing PMC members are the ones that count, just 
like for a release. Quoting from [1]:

"There are two expected mechanisms by which a project may enter the Attic. 
Either the managing Project Management Committee (PMC) decides it would like to 
move the project, or The Apache Software Foundation's board dissolves the PMC 
and chooses to move the project.”

And

"Projects whose PMC are unable to muster 3 votes for a release, who have no 
active committers or are unable to fulfill their reporting duties to the board 
are all good candidates for the Attic.”

I’m holding my vote for now. I would vote +1 if I don’t see sufficient ability 
for the PMC to carry the project.


Anthony

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fattic.apache.org%2Fdata=05%7C01%7Cdasmith%40vmware.com%7Cf225bd8095d547424ccb08dab6bf7be7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638023230840068545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=wo6wDQSJk8FdjjJs%2F8IM5Lnh2sn9b%2F8E2Epr3Uw9HlM%3Dreserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fattic.apache.org%2F=05%7C01%7Czhouxh%40vmware.com%7Cbf5c8b89480f443795fb08dab931f9d6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638025921557514639%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=AOgo%2BKBcs6exFgplFGBGbnaJBqhjPoC%2F56A2O%2FK%2BAao%3D=0>


> On Oct 24, 2022, at 10:51 AM, Dan Smith  wrote:
>
> ⚠ External Email
>
> Hi folks,
>
> Last week we discussed moving Geode to the attic [1]. I'm sad to find us at 
> this point. But given that we will soon be without three active members to 
> form a functional PMC, it's time to put this to a vote. I propose we dissolve 
> the Geode PMC and move the Geode project to the attic.
>
> Please cast your vote by Monday Oct 31st.
>
> Thanks,
> -Dan
>
> [1] 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fbx6t1cppbj7nmfjnf9gtcqjljp8bdf0ydata=05%7C01%7Cdasmith%40vmware.com%7Cf225bd8095d547424ccb08dab6bf7be7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638023230840068545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=lBa0R9kBOwMiWAt6amSAlzvNmx4eZa%2BWW3namhLMgOg%3Dreserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fbx6t1cppbj7nmfjnf9gtcqjljp8bdf0y=05%7C01%7Czhouxh%40vmware.com%7Cbf5c8b89480f443795fb08dab931f9d6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638025921557514639%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=KOSUFVeMryNi8qSqHp%2FCvJgMrFonCioOnvzyj%2F6u6qk%3D=0>
>
> 
>
> ⚠ External Email: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender.



Re: [VOTE] Apache Geode 1.13.3.RC1

2021-06-22 Thread Xiaojian Zhou
+1

On 6/21/21, 2:23 PM, "Nabarun Nag"  wrote:

+1 based on the following:

  *   build from source
  *   running gfsh
  *   starting 2 site WAN cluster
  *   verifying data propagation from the 2 sites using puts and gets
  *   Rolling clusters from 1.12 to the release candidate.
  *   Rebalance operations during upgrades.

Regards,
Nabarun




From: Owen Nichols 
Sent: Monday, June 21, 2021 11:02 AM
To: dev@geode.apache.org 
Subject: [Suspected Spam] [VOTE] Apache Geode 1.13.3.RC1

Hello Geode Dev Community,

I'd like to propose an expedited 1.13 patch release (24-hour voting 
deadline instead of 72).

This is a release candidate for Apache Geode version 1.13.3.RC1.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you 
performed.

Voting deadline:
3PM PDT Tue, June 22 2021.

Please note that we are voting upon the source tag:
rel/v1.13.3.RC1

Release notes:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.13.3data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252546081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=rzc2LFwn%2F%2BH08Zzd2wx8pKGfv5ShaNNPg8yOeaSovuM%3Dreserved=0

Source and binary distributions:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.13.3.RC1%2Fdata=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252546081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=qA2IsYiemuulcRDKmNFPUx4Z2MA7p7hpgnuGFmJNXDc%3Dreserved=0

Maven staging repo:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1085data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252546081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=13J6wbcK9UFX3eNUpC92og0sSHxLXB8a7%2B3Y6WbwjlI%3Dreserved=0

GitHub:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.13.3.RC1data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252546081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=1kTqylgOr1FoYvMv43%2BT%2BkaZw06lwG93KUDkkDgKJIs%3Dreserved=0

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.13.3.RC1data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252546081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=oITa7EJShGjMPIixVfqKZ%2Bf3MWWQjZaeHFJ1ddOOqHA%3Dreserved=0

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.13.3.RC1data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252556038%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=46Ph5ElaLgyRDiAS2kXfYrTKDmJ5OgbrYCI6hCfILK0%3Dreserved=0

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-benchmarks%2Ftree%2Frel%2Fv1.13.3.RC1data=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252556038%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=cS6R8VFlB4fedjEmNuCVHi%2B%2FAxvQnp40DElKuCMUWvs%3Dreserved=0

Pipelines:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-maindata=04%7C01%7Czhouxh%40vmware.com%7Cf6bc46ee9e2044b5f6be08d934fad8c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637599074252556038%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=nM%2BxIqsE6kqV%2Fh%2FEqVVwMimZvNbvRLF6sSvpkWSLCYM%3Dreserved=0


Re: [PROPOSAL] backport GEODE-8998 to 1.14

2021-03-04 Thread Xiaojian Zhou
+1

On 3/4/21, 9:37 AM, "Darrel Schneider"  wrote:

I'm resending this request because my previous request was labelled as junk.

I would like to backport GEODE-8998 to 1.14. If fixes an NPE
that will cause the geode cluster to not be able to do any
cluster messaging if thread monitoring is disabled.
This bug has never been released so it would be nice
keep it that way.


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8998data=04%7C01%7Czhouxh%40vmware.com%7Ce055640978064d13ec5c08d8df343baf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637504762736860453%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=YvQFk7lu4SQBrmWu1RxCGLDSt4m1lf%2FiSvj2SMca6pE%3Dreserved=0




Re: [Suspected Spam] [PROPOSAL] backport GEODE-8998 to 1.14

2021-03-04 Thread Xiaojian Zhou
+1

On 3/4/21, 9:31 AM, "Mark Hanson"  wrote:

+1

On 3/3/21, 5:18 PM, "Darrel Schneider"  wrote:

I would like to backport GEODE-8998 to 1.14. If fixes an NPE
that will cause the geode cluster to not be able to do any
cluster messaging if thread monitoring is disabled.
This bug has never been released so it would be nice
keep it that way.


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8998data=04%7C01%7Czhouxh%40vmware.com%7C0c3ac3e2f01a4af1cd0a08d8df335272%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637504758812730608%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZduL5cEWDvIHgHU34rptlfRBQfSTlJS%2BFHzgUh7RHTI%3Dreserved=0




Re: [Proposal] Backport GEODE-8958 into 1.14.x, 1.13.x, 1.12.x branches

2021-03-03 Thread Xiaojian Zhou
+1

On 3/1/21, 11:30 PM, "Owen Nichols"  wrote:

That sounds a lot better than never expiring them if that does happen, I 
think this would be good to include.

On 3/1/21, 2:41 PM, "Mark Hanson"  wrote:

I would like to backport GEODE-8958 into previous release branches to 
alleviate a problem with tombstones if timestamps become corrupt for some 
reason.

Thanks,
Mark




Re: [DISCUSS] Geode 1.14

2021-01-04 Thread Xiaojian Zhou
My opinion:
(1) List out must fix bugs for 1.4 (I don’t think there’s any, but it’s good to 
review)
(2) cut the 1.4 release branch and start the stabilization period asap.

Gester

From: Anilkumar Gingade 
Date: Monday, January 4, 2021 at 7:23 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14
My recommendation will be:
- Identify, Prioritize, Merge 1.14 related work
- Stabilize. Cut the branch and Stabilize again (to test any new changes added 
during first stabilize period)

-Anil.


On 12/18/20, 2:26 PM, "Mark Hanson"  wrote:

I support the cut on a predetermined date. But I will be ok with the 
Stabilize first approach, because I think that having a stable build is a 
prerequisite for any time based model. But like all things, this is a smell 
that we have to do this... The other thing is that specifying a date or a 
window of time in my opinion is crucial to ensuring freshly baked features are 
not merged until we cut the release. The window need not be very long a day or 
two as an example. With the volume of defects that we need to assess/fix 
maintaining control of develop seems important.  So I would propose that we 
give notice of when we are looking to cut the branch (once we have made 
adequate determinations for the defects).

Thanks,
Mark

On 12/18/20, 12:09 PM, "Owen Nichols"  wrote:

To summarize this thread so far:
@Robert and @Jens seem to favor “cut then stabilize”
@Alexander and @John seem to favor “stabilize then cut”
No one seems to favor “cut on a predetermined date” (at least for 1.14)

@John also made a creative suggestion that maybe 1.14 doesn’t have to 
be cut from latest develop…what if we cut it from support/1.13 and then 
backport just the redis changes (in parallel with continuing to stabilize 
what’s currently on develop into a 1.15 release).

For now let’s try to proceed on the “stabilize then cut” plan.  All 
committers, please hold off on merging big refactorings or other high-risk 
changes to develop until after the branch is cut.  Let’s regroup next month and 
try to clarify exactly which GEODE Jira tickets we need to focus on to make 
sure 1.14 is our best release.

From: Owen Nichols 
Date: Tuesday, December 1, 2020 at 12:26 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14
If someone wants to propose a list of must-fix Jira tickets before we 
can cut the branch, I see that as a shift from a time-based to feature-based 
branch-cut strategy.  Might be fun to try?

Given the distributed nature of the Geode community, picking a date and 
sticking to it allows decentralized decision-making (each contributor can plan 
on their own what they can finish and/or how they can help get develop as 
stable as possible by that date).

To answer your question: the current state of develop feels “pretty 
good” to me.  Knowing that only critical fixes will be allowed onto the branch 
once cut, the question is really about features.  It sounds like there is redis 
work we’d like to ship.  Anything else nearly-done we should considering 
waiting on?

From: Alexander Murmann 
Date: Monday, November 30, 2020 at 11:57 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14
Hi all,

Thanks, Owen for reminding us all of this topic!

I wonder how we feel about the state of develop right now. If we cut 
1.14 right now, it will make it easier to stabilize and ship it. However, I see 
21 open JIRA tickets affecting 1.14.0. It might be better to have an all-hands 
effort to address as much as possible on develop and then cut 1.14. If we shift 
all attention to 1.14, develop will likely never get better. I'd love to get 
closer to an always shippable develop branch. That should vastly reduce future 
release pain and make everyday development better as well.

Thoughts?

From: Jens Deppe 
Sent: Wednesday, November 25, 2020 20:11
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14

Hi Owen,

Thanks for starting this conversation and especially for volunteering 
as Release Manager!

Since we're already a couple of quarters 'behind', in terms of 
releases, I'd prefer cutting the 1.14 branch ASAP. Leaving it until Feb means 
we'll have 9 months of changes to stabilize. How long might that take to 
finally get shipped? (rhetorical).

--Jens

On 11/25/20, 6:05 PM, "Owen Nichols"  wrote:

The trigger in @Alexander’s July 28 proposal to postpone 1.14 has 
been met (we shipped 1.13).
It’s time to discuss when we want to cut the 1.14 branch.  I will 
volunteer as Release Manager.

Below are all release dates since Geode adopted a time-based 
release cadence.

Minor releases:
1.13   branch cut May 4 2020, 1.13.0 

[PROPOSAL] backporting GEODE-8764 to 1.13 and 9.10

2020-12-03 Thread Xiaojian Zhou
GEODE-8764 is an enhanced version of GEODE-6930.

Lucene functions should only require DATA:READ permission on the specified 
region, no need to gain permission on other unrelated regions.

The fix has no risk.

Regards
Xiaojian Zhou


Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-12-02 Thread Xiaojian Zhou
OK, I double checked, my memory is wrong. It was true as early as 6.0.

From: Xiaojian Zhou 
Date: Wednesday, December 2, 2020 at 3:29 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby 
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a 
mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with 
conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with 
conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated 
dictated which thread was used on the remote server. If the server where the 
operation originated had conserve-sockets=false, then the remote server used an 
unshared P2P message reader to process the replication no matter what its 
conserve-sockets setting was. And if the server where the operation originated 
had conserve-sockets=true, then the remote server used a shared P2P message 
reader to process the replication no matter what its conserve-sockets setting 
was.

Here is some logging from a DistributionMessageObserver that shows that 
behavior.

Case 1:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606929894787; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server-conserve-sockets1:58995):41002]

2. An unshared P2P message reader in server2 handles the 
UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule 
msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; 
sender=192.168.1.8(server1:58984):41001; op=UPDATE; key=0; 
newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=afterProcessMessage; 
time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606932400283; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server1:63224):41001]

2. The shared P2P message reader in server2 handles the 
UpdateWithContextMessage and sends the ReplyMessage even though 
conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared ordered uid=4 local port=54619 remote port=61472: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; 
sender=192.168.1.8(server-conserve-sockets1:63240):41002; op=UPDATE; 
key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared ordered uid=4 local port=54619 remote port=61472: 
TestDistributionMessageObserver operation=beforeSendMessage; 
time=1606932400296; message=ReplyMessage processorId=42 from null; 
recipients=[192.168.1.8(server-conserve-sockets1:63240):41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared

Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-12-02 Thread Xiaojian Zhou
+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby 
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a 
mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with 
conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with 
conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated 
dictated which thread was used on the remote server. If the server where the 
operation originated had conserve-sockets=false, then the remote server used an 
unshared P2P message reader to process the replication no matter what its 
conserve-sockets setting was. And if the server where the operation originated 
had conserve-sockets=true, then the remote server used a shared P2P message 
reader to process the replication no matter what its conserve-sockets setting 
was.

Here is some logging from a DistributionMessageObserver that shows that 
behavior.

Case 1:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606929894787; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server-conserve-sockets1:58995):41002]

2. An unshared P2P message reader in server2 handles the 
UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule 
msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; 
sender=192.168.1.8(server1:58984):41001; op=UPDATE; key=0; 
newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984):41001 unshared ordered 
uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=afterProcessMessage; 
time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606932400283; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server1:63224):41001]

2. The shared P2P message reader in server2 handles the 
UpdateWithContextMessage and sends the ReplyMessage even though 
conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared ordered uid=4 local port=54619 remote port=61472: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; 
sender=192.168.1.8(server-conserve-sockets1:63240):41002; op=UPDATE; 
key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared ordered uid=4 local port=54619 remote port=61472: 
TestDistributionMessageObserver operation=beforeSendMessage; 
time=1606932400296; message=ReplyMessage processorId=42 from null; 
recipients=[192.168.1.8(server-conserve-sockets1:63240):41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240):41002 
shared ordered uid=4 local port=54619 remote port=61472: 
TestDistributionMessageObserver operation=afterProcessMessage; 
time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; 

Re: Geode - store and query JSON documents

2020-11-26 Thread Xiaojian Zhou
ows LuceneQueryException {
if (query == null) {
  return null;
}

PageableLuceneQueryResults results = query.findPages();
if (results.size() > 0) {
  System.out.println("Search found " + results.size() + " results in " + 
regionName + ", page size is " + query.getPageSize());
}

HashSet values = new HashSet<>();
while (results.hasNext()) {
  results.next().stream()
  .forEach(struct -> {
Object value = struct.getValue();
if (value instanceof PdxInstance) {
  PdxInstance pdx = (PdxInstance) value;
  String jsonString = JSONFormatter.toJSON(pdx);
  List dataList = 
(LinkedList)pdx.getField("data");
  for (PdxInstance data:dataList) {
Object colObject = ((PdxInstance)data).getField("col1");
System.out.println("col="+colObject);
  }
  System.out.println("Found a json object:" + 
jsonString+":dataList="+dataList);
  values.add(pdx);
} else {
  System.out.println("key=" + struct.getKey() + ",data=" + 
value);
  values.add(value);
}
  });
}
System.out.println("Search found " + values.size() + " results in " + 
regionName);
return values;
  }
}

Regards
Xiaojian

On 11/23/20, 9:46 AM, "Xiaojian Zhou"  wrote:

Ankit:

Anil can provide you some sample code of OQL query on JSON.

I will find some lucene sample code on JSON for you. 

Regards
Xiaojian

On 11/23/20, 9:27 AM, "ankit Soni"  wrote:

Hi
I am looking for any means of querying (OQL/Lucene/API etc..?) this 
stored
data. Looking for achieving this functionality first and second, in a
performant way.

I shared the OQL like syntax, to share my use-case easily and based on 
some
reference found on doc. I am ok if a Lucene query or some other way can
fetch the results.

It will be of great help if you share the sample query/code fetching 
this
data .

Thanks
Ankit.


On Mon, 23 Nov 2020 at 22:43, Xiaojian Zhou  wrote:

> Anil:
>
> The syntax is OQL. But I understand they want to query JSON object 
base on
> the criteria.
>
> On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:
>
> Gester, Looking at the sample query, I Believe Ankit is asking 
about
> OQL query not Lucene...
>
> -Anil.
>
    >
> On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:
>
> Ankit:
>
> Geode provided lucene query on json field. Your query can be
> supported.
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.htmldata=04%7C01%7Czhouxh%40vmware.com%7Cd1fa267b22a14927924108d88fd7b941%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417503982299596%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CWGBZA0ZgNJi7XST88NM8iJhR360sdLiG5LXCeECrO4%3Dreserved=0
>
> However in above document, it did not provided a query 
example on
> JSON object.
>
> I can give you some sample code to query on JSON.
>
> Regards
> Xiaojian Zhou
>
> On 11/22/20, 11:53 AM, "ankit Soni" 

> wrote:
>
> Hello geode-devs, please provide a guidance on this.
>
> Ankit.
>
> On Sat, 21 Nov 2020 at 10:23, ankit Soni <
> ankit.soni.ge...@gmail.com> wrote:
>
> > Hello team,
> >
> > I am *evaluating usage of Geode (1.12) with storing JSON
> documents and
> > querying the same*. I am able to store the json records
> successfully in
> > geode but seeking guidance on how to query them.
> > More details on code and sample json is,
> >
> >
> > *Sample client-code*
> >
> > import org.apache.geode.cache.client.ClientCache;
> > import org.apache.geode.cache.client.ClientCacheFactory;
  

Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Ankit:

Anil can provide you some sample code of OQL query on JSON.

I will find some lucene sample code on JSON for you. 

Regards
Xiaojian

On 11/23/20, 9:27 AM, "ankit Soni"  wrote:

Hi
I am looking for any means of querying (OQL/Lucene/API etc..?) this stored
data. Looking for achieving this functionality first and second, in a
performant way.

I shared the OQL like syntax, to share my use-case easily and based on some
reference found on doc. I am ok if a Lucene query or some other way can
fetch the results.

It will be of great help if you share the sample query/code fetching this
data .

Thanks
Ankit.


On Mon, 23 Nov 2020 at 22:43, Xiaojian Zhou  wrote:

> Anil:
>
> The syntax is OQL. But I understand they want to query JSON object base on
> the criteria.
>
> On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:
>
> Gester, Looking at the sample query, I Believe Ankit is asking about
> OQL query not Lucene...
>
> -Anil.
    >
>
> On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:
>
> Ankit:
>
> Geode provided lucene query on json field. Your query can be
> supported.
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.htmldata=04%7C01%7Czhouxh%40vmware.com%7Cf39e257a59314869f37108d88fd51348%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417492605622263%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=TzvDCdlG6olUERrjYy%2F1L0ZqwbyaPgW6FCzXWoOSLJw%3Dreserved=0
>
> However in above document, it did not provided a query example on
> JSON object.
>
    >     I can give you some sample code to query on JSON.
>
> Regards
> Xiaojian Zhou
>
> On 11/22/20, 11:53 AM, "ankit Soni" 
> wrote:
>
> Hello geode-devs, please provide a guidance on this.
>
> Ankit.
>
> On Sat, 21 Nov 2020 at 10:23, ankit Soni <
> ankit.soni.ge...@gmail.com> wrote:
>
> > Hello team,
> >
> > I am *evaluating usage of Geode (1.12) with storing JSON
> documents and
> > querying the same*. I am able to store the json records
> successfully in
> > geode but seeking guidance on how to query them.
> > More details on code and sample json is,
> >
> >
> > *Sample client-code*
> >
> > import org.apache.geode.cache.client.ClientCache;
> > import org.apache.geode.cache.client.ClientCacheFactory;
> > import org.apache.geode.cache.client.ClientRegionShortcut;
> > import org.apache.geode.pdx.JSONFormatter;
> > import org.apache.geode.pdx.PdxInstance;
> >
> > public class MyTest {
> >
> > *//NOTE: Below is truncated json, single json document
> can max contain an array of col1...col30 (30 diff attributes) within 
data. *
> > public final static  String jsonDoc_2 = "{" +
> > "\"data\":[{" +
> > "\"col1\": {" +
> > "\"k11\": \"aaa\"," +
> > "\"k12\":true," +
> > "\"k13\": ," +
> > "\"k14\":
> \"2020-12-31:00:00:00\"" +
> > "}," +
> > "\"col2\":[{" +
> > "\"k21\": \"22\"," +
> > "\"k22\": true" +
> > "}]" +
> > "}]" +
> > "}";
> >
> > * //NOTE: Col1col30 are mix of JSONObject ({}) and
> JSONArray ([]) as shown above in jsonDoc_2;*
>

Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-11-23 Thread Xiaojian Zhou
Passed dunit tests is not enough. It might only mean we don't have enough test 
coverage. 

We need to inspect the code to see what will be the behavior when 2 servers 
configured different conserve-sockets.

On 11/20/20, 3:30 PM, "Donal Evans"  wrote:

Regarding behaviour during RollingUpgrade; I created a draft PR with this 
change to test the feasibility and see what problems, if any, would be caused 
by tests assuming the default setting to be true. After fixing two DUnit tests 
that were not explicitly setting the value of conserve-sockets to true, no test 
failures were observed. I also ran a large suite of proprietary tests that 
include rolling upgrade and observed no problems there. This doesn't mean that 
there would definitely be no problems caused by this change, but I can at least 
say that none of the testing we currently have showed any problems.

From: Anthony Baker 
Sent: Friday, November 20, 2020 8:52 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false

Question:  how would this work with a rolling upgrade?  If the user did not 
set this property and we changed the default I believe that we would prevent 
the upgraded member from rejoining the cluster.

Of course the user could explicitly set this property as you point out.


Anthony


> On Nov 20, 2020, at 8:49 AM, Donal Evans  wrote:
>
> While I agree that the potential impact of having the setting changed out 
from a user may be high, the cost of addressing that change is very small. All 
users have to do is explicitly set the conserve-sockets value to true if they 
were previously using the default and they will be back to where they started 
with no change in behaviour or resource requirements. This could be as simple 
as adding a single line to a properties file, which seems like a pretty small 
inconvenience.
>
> Get Outlook for 
Android
>
> 
> From: Anthony Baker 
> Sent: Thursday, November 19, 2020 5:57:33 PM
> To: dev@geode.apache.org 
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false
>
> I think there are many good reasons to flip the default value for this 
property. I do question whether requiring a user to allocate new hardware to 
support the changed resource requirements is appropriate for a minor version 
bump. In most cases I think that would come as an unwelcome surprise during the 
upgrade.
>
> Anthony
>
>> On Nov 19, 2020, at 10:42 AM, Dan Smith  wrote:
>>
>> Personally, this has caused enough grief in the past (both ways, 
actually!) that I 'd say this is a major version change.
>> I agree with John. Either value of conserve-sockets can crash or hang 
your system depending on your use case.
>>
>> If this was just a matter of slowing down or speeding up performance, I 
think we could change it. But users that are impacted won't just see their 
system slow down. It will crash or hang. Potentially only with production sized 
workloads.
>>
>> With conserve-sockets=false every thread on the server creates its own 
sockets to other servers. With N servers that's N sockets per thread. With our 
default of a max of 800 threads for client connections and a 20 server cluster 
you are looking at a worst case of 800 * 20 = 16K sending sockets per server, 
with another 16K receiving sockets and 16K receiving threads. That's before 
considering function execution threads, WAN receivers, and various other 
executors we have on the server. Users with too many threads will hit their 
file descriptor or thread limits. Or they will run out of memory for thread 
stacks, socket buffers, etc.
>>
>> -Dan
>>
>




Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Anil:

The syntax is OQL. But I understand they want to query JSON object base on the 
criteria. 

On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:

Gester, Looking at the sample query, I Believe Ankit is asking about OQL 
query not Lucene...

-Anil.


On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:

Ankit:

Geode provided lucene query on json field. Your query can be supported. 

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.htmldata=04%7C01%7Czhouxh%40vmware.com%7Ca1c897031e4b481a2f1508d88fd270f6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417481290223899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=pxnkFepPHN61G0wIyfROqIFx5J9aRdyg1GpGHN%2FCU74%3Dreserved=0

However in above document, it did not provided a query example on JSON 
object. 

I can give you some sample code to query on JSON.

Regards
    Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni 
 wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON 
documents and
> querying the same*. I am able to store the json records 
successfully in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can 
max contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": 
\"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and 
JSONArray ([]) as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME 
where data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME 
where data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record 
from () a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to 
JSONFormatter API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above 
sample
> queries.*
>
> *Thanks*
> *Ankit*
>





Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Ankit:
 
Geode provided lucene query on json field. Your query can be supported. 
https://gemfire.docs.pivotal.io/910/geode/tools_modules/lucene_integration.html

However in above document, it did not provided a query example on JSON object. 

I can give you some sample code to query on JSON.

Regards
Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni  wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max 
contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray ([]) 
as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME where 
data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME where 
data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to JSONFormatter API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above sample
> queries.*
>
> *Thanks*
> *Ankit*
>



Re: [DISCUSS] Adding CODEOWNERS to Apache Geode

2020-11-20 Thread Xiaojian Zhou
+1

I saw the template of splitting the geode code. Can someone nominate a few 
codeowners in the file as examples? 

On 11/20/20, 7:32 AM, "Alexander Murmann"  wrote:

+1

I agree with Owen's point that this will improve the experience for new 
contributors. It also helps people new to the community to have confidence that 
they got the type of review they need to feel confident to merge. I might get 
to reviews that are both from great committers who can review for things like 
coding style, test coverage etc. However, I might be unaware that neither of 
them know the area I am modifying particularly well. This solves this problem. 
I can merge with more confidence, once I got the review from the owner.

From: Anthony Baker 
Sent: Thursday, November 19, 2020 17:55
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Adding CODEOWNERS to Apache Geode

+1

I think we as a project will need to iterator on the code owners as well as 
the process for code owners.  But this is a model that has been adopted by a 
number of OSS projects both within and outside of Apache.  I like that it 
provides visibility to PR authors and associates motivated experts to review 
and merge changes.

Anthony


> On Nov 19, 2020, at 10:46 AM, Ernie Burghardt  
wrote:
>
> Perfect, then let's give this a try.
> +1
>
> On 11/19/20, 10:45 AM, "Robert Houghton"  wrote:
>
>Hi Ernie,
>
>DRAFT PRs do not get reviewers by default, but when the draft 
transitions to ‘ready’, then the owners are requested to review.
>
>
>From: Ernie Burghardt 
>Date: Thursday, November 19, 2020 at 9:56 AM
>To: dev@geode.apache.org 
>Subject: Re: [DISCUSS] Adding CODEOWNERS to Apache Geode
>Does GitHub allow us to limit this automated action to non-DRAFT PRs?
>
>On 11/18/20, 8:28 PM, "Owen Nichols"  wrote:
>
>+1 This will greatly improve the experience for contributors.  
Instead of an intimidating empty list of reviewers when you submit a PR (and no 
ability to add reviewers, if you’re not a committer), it will be great to 
already have at least two reviewers automagically assigned.
>
>I have a small concern that initially populating this file via a 
flurry of PRs may result in a lot of merge conflicts with anyone else that 
volunteers on the same or an adjacent line.  Also, since you _must_ be a 
committer to be a code owner, is a PR even necessary…would directly committing 
changes to the feature/introduce-codeowners branch be acceptable?  If not, who 
needs to review and who can merge the PRs against the ‘introduce’ branch?
>
>What happens if you are the only owner for an area, can you 
approve your own PR?  Even if the goal is two owners per area, does that mean 
PRs by either owner cannot be merged if the only other owner is on vacation or 
otherwise unavailable?
>
>Can we submit PRs against the ‘introduce’ branch now and they just 
won’t be merged before Nov 26, or do we all just need to be patient until this 
review period has concluded?
>
>From: Robert Houghton 
>Date: Wednesday, November 18, 2020 at 2:07 PM
>To: dev@geode.apache.org 
>Subject: [DISCUSS] Adding CODEOWNERS to Apache Geode
>Hello Devs.
>
>I would like to improve the quality of the pull-request reviews we 
see for
>critical parts of the Apache Geode project. In discussions with 
other
>committers, a (not the) big hurdle to that is getting the right 
eyes to
>look at a given PR. To that end, I propose the adoption of GitHub's
>CODEOWNERS functionality for the Apache Geode code repository.
>
>A discussion-document of this issue has been written up
>by @upthewaterspout. Thanks Dan!
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduce%2BCodeowners%2Bfiledata=04%7C01%7Czhouxh%40vmware.com%7Cdb988c7609e2457fa8ad08d88d697629%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637414831379830580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=gd61vzFmopbgJg%2FRI1I%2BqMtbyEa1LrQOXJULV4D6hyE%3Dreserved=0
>
>I have tested the feature with fellow Geode committers 
@upthewaterspout
>and @onichols-pivotal, and found it to meet our expectations.  
Please
>review the document, and comment or reply to this thread, by 25 
November,
>so we might start the task of nominating and applying for 
ownership.
>
>-Robert Houghton
>




Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-11-20 Thread Xiaojian Zhou
1) Conserve-socket will only impact p2p connection. If set to false, that mean 
the p2p connections between 2 servers can be created on request, as many as 
needed. 
2) currently the default setting is true (I don't remember when did we change 
it from false to true)
3) For rollingUpgrade, unfortunately, if server1 is set to true and server2 is 
set to false, our server start-up will not check the mismatch automatically so 
far. We have to add some coding to prevent a server with different setting to 
join. And I don't know in current mixed setting environment, what will be the 
behavior. It could be an interesting dunit test scenario. 

Regards
Gester

On 11/20/20, 8:53 AM, "Anthony Baker"  wrote:

Question:  how would this work with a rolling upgrade?  If the user did not 
set this property and we changed the default I believe that we would prevent 
the upgraded member from rejoining the cluster.

Of course the user could explicitly set this property as you point out.


Anthony


> On Nov 20, 2020, at 8:49 AM, Donal Evans  wrote:
> 
> While I agree that the potential impact of having the setting changed out 
from a user may be high, the cost of addressing that change is very small. All 
users have to do is explicitly set the conserve-sockets value to true if they 
were previously using the default and they will be back to where they started 
with no change in behaviour or resource requirements. This could be as simple 
as adding a single line to a properties file, which seems like a pretty small 
inconvenience.
> 
> Get Outlook for 
Android
> 
> 
> From: Anthony Baker 
> Sent: Thursday, November 19, 2020 5:57:33 PM
> To: dev@geode.apache.org 
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false
> 
> I think there are many good reasons to flip the default value for this 
property. I do question whether requiring a user to allocate new hardware to 
support the changed resource requirements is appropriate for a minor version 
bump. In most cases I think that would come as an unwelcome surprise during the 
upgrade.
> 
> Anthony
> 
>> On Nov 19, 2020, at 10:42 AM, Dan Smith  wrote:
>> 
>> Personally, this has caused enough grief in the past (both ways, 
actually!) that I 'd say this is a major version change.
>> I agree with John. Either value of conserve-sockets can crash or hang 
your system depending on your use case.
>> 
>> If this was just a matter of slowing down or speeding up performance, I 
think we could change it. But users that are impacted won't just see their 
system slow down. It will crash or hang. Potentially only with production sized 
workloads.
>> 
>> With conserve-sockets=false every thread on the server creates its own 
sockets to other servers. With N servers that's N sockets per thread. With our 
default of a max of 800 threads for client connections and a 20 server cluster 
you are looking at a worst case of 800 * 20 = 16K sending sockets per server, 
with another 16K receiving sockets and 16K receiving threads. That's before 
considering function execution threads, WAN receivers, and various other 
executors we have on the server. Users with too many threads will hit their 
file descriptor or thread limits. Or they will run out of memory for thread 
stacks, socket buffers, etc.
>> 
>> -Dan
>> 
> 




Re: [VOTE] Apache Geode 1.13.1.RC2

2020-11-20 Thread Xiaojian Zhou
+1

On 11/17/20, 12:52 PM, "Dave Barnes"  wrote:

+1

Docs review

   - Built user guides for geode and geode-native using the provided Docker
   scripts.
   - Opened the pre-built Geode javadocs in the binary distro.


Everything worked and looked as it should.

NOTE: The javadocs are branded "1.13.1" but the User Guides are still
"1.13". I regard this as correct -- there's no precedent I'm aware of for
Geode patch releases that would indicate otherwise.

On Tue, Nov 17, 2020 at 12:33 PM Nabarun Nag  wrote:

> +1
> Started gfsh, created cluster, create region, entry put and get, query
> execution.
> Build from source.
>
> Regards
> Nabarun Nag
>
> 
> From: Joris Melchior 
> Sent: Tuesday, November 17, 2020 12:25 PM
> To: dev@geode.apache.org 
> Subject: Re: [VOTE] Apache Geode 1.13.1.RC2
>
> +1
>
> Looks good to me. Did a build and gfsh test-drive.
>
> From: Dan Smith 
> Date: Monday, November 16, 2020 at 1:29 PM
> To: dev@geode.apache.org 
> Subject: Re: [VOTE] Apache Geode 1.13.1.RC2
> +1
>
> Looks good to me! I ran the geode-release-check against it, looked for
> binary artifacts, checked the pipeline.
>
> -Dan
> 
> From: Dick Cavender 
> Sent: Thursday, November 12, 2020 5:00 PM
> To: dev@geode.apache.org 
> Subject: [VOTE] Apache Geode 1.13.1.RC2
>
> Hello Geode Dev Community,
>
> This is a release candidate for Apache Geode version 1.13.1.RC2.
> Issues with creation of RC1 forced moving to RC2.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> Voting deadline:
> 3PM PST Tue, November 17 2020.
>
> Please note that we are voting upon the source tag:
> rel/v1.13.1.RC2
>
> Release notes:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.13.1data=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542259169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=L7qXglUmTAUEBGGFagHQ7eyAWUVThWzFzLzI6Hdaq7c%3Dreserved=0
>
> Source and binary distributions:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.13.1.RC2%2Fdata=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542259169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=XvkAGRmZvCUYCGAbOJFrC9uy06qvBTz8UM4fqB50%2FL8%3Dreserved=0
>
> Maven staging repo:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1071data=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542259169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=eHiqGqxOEIE8X7UN8YTkqIQsnLorH5E1VpzljB3v4Fg%3Dreserved=0
>
> GitHub:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.13.1.RC2data=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542259169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=cuVbhRsxYZOZW2xqGMLD3iqm%2F2tB7RqCSLjlaIQEqas%3Dreserved=0
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.13.1.RC2data=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542269124%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ydXni0zoBdAidC3qe612wVBcjo8JOfU8SpQQAMtMX%2FA%3Dreserved=0
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.13.1.RC2data=04%7C01%7Czhouxh%40vmware.com%7C0887c6cee30848028d0808d88b3ab493%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637412431542269124%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=RmlAU%2FCxh5KvHJB0c1RvVK6zQ%2Bawc6ZI5aAOd29oZ4o%3Dreserved=0
>
> 

Re: Apache Geode 1.13.1 patch proposal

2020-11-12 Thread Xiaojian Zhou
+1

On 11/12/20, 11:54 AM, "Anilkumar Gingade"  wrote:

+1

On 11/12/20, 11:34 AM, "Owen Nichols"  wrote:

+1 Sounds good to me, thanks @Dick for stepping up!

Let's also start posting Geode release artifacts to GitHub too (as many 
other projects already do).  I've backfilled the last couple releases, check it 
out here: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Freleasesdata=04%7C01%7Czhouxh%40vmware.com%7C6046505f444a4ac188ad08d88744b3ac%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637408076423796353%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=OVaMT5%2FHFs33U901u2VIRkFXgp6EiO%2BNtX0Nkha1RmE%3Dreserved=0

On 11/12/20, 11:01 AM, "Dick Cavender"  wrote:

It's been two months since the 1.13.0 release and there have been 
28 important fixes on support/1.13 that the community would benefit from. Based 
on this I'd like to propose release of Apache Geode 1.13.1 based on the current 
support/1.13 branch. I'll volunteer to be the release manager for 1.13.1 so 
look forward to an RC1 soon.

-Dick






[PROPOSAL] backport GEODE-8651 to 1.13, 9.10, 9.9

2020-10-27 Thread Xiaojian Zhou
Hi, all:

The fix is to resolve a hang when Connection called notifyHandshakeWaiter the 
2nd time and cleared the NioFilter’s unwrapped buffer by mistake.

The 2nd call should consider if the 1st call has finished handshake. If yes, do 
nothing. The fix is fully tested and has no risk. This problem exists in 
earlier versions and should be backported.

Regards

Xiaojian Zhou


[PROPOSAL] Backport GEODE-8608 to support 1.13, 1.12 branch

2020-10-14 Thread Xiaojian Zhou
Hi,

There’s a race that StateFlush could hang when the target member is shutdown. 
GEODE-8608 fixed. This fix is a patch to GEODE-8385.

The fix should be backported to all previous versions with GEODE-8385.

We are still waiting for prechecking to finish.

Regards
Xiaojian Zhou


[PROPOSAL] Backport GEODE-6008 to support 1.12

2020-09-29 Thread Xiaojian Zhou
Hi,

GEODE-6008 changed “java.lang.IllegalStateException: NioSslEngine has been 
closed” to IOException, which enabled DirectChannel to handle it and retry the 
connection in the case that the connection is closed.

This fix is important and no risk to backport to support/1.12. Please vote for 
it.

Regards
Xiaojian Zhou




[PROPOSAL] Backport GEODE-8475 to 1.13

2020-09-02 Thread Xiaojian Zhou
Hi, All:

I want to backport my fix in GEODE-8475 to 1.13. It fixed a hang caused by a 
potential deadlock.

This fix is quite safe, I have verified it by running all queue related 
regression.

Regards
Gester


Re: [PROPOSAL] Backport GEODE-8432 to 1.13

2020-08-20 Thread Xiaojian Zhou
It's using region path instead of getting the region. It should be no risk. 

On 8/19/20, 10:25 AM, "Xiaojian Zhou"  wrote:

This problem also exists in 1.13.



[PROPOSAL] Backport GEODE-8432 to 1.13

2020-08-19 Thread Xiaojian Zhou
This problem also exists in 1.13.


Re: [VOTE] change Default branch for geode-examples to 'develop'

2020-07-31 Thread Xiaojian Zhou
-1

I often need to build geode-examples on older geode version (more frequent than 
current version). 

One more irrelevant comments: we do need to enhance our geode-examples. The 
current examples are too weak. 

On 7/30/20, 8:16 AM, "Blake Bender"  wrote:

FWIW, Geode Native works around this by not keeping a separate examples 
repo at all.  To build our examples, you *must* build your own Geode Native 
"installation," which includes the examples tree, or download the desired 
tarball/zip file from our GitHub releases.

I’m pretty much agnostic as to which way we should go for any particular 
repository, which is why I wrote "remove or rename" in the title of GEODE-8335. 
 Let's do the right thing for each repo, but not expend a ton of brain power on 
it.  I brought this up as a *small* gesture we should make in the name of 
kindness, not a large effort.

Thanks,

Blake


On 7/18/20, 2:42 AM, "Owen Nichols"  wrote:

Voting Results:
+1: 5 votes 
 0: 0 votes 
-1: 1 vote

The voting is successful by majority vote.  INFRA has completed the 
requested change and git clone g...@github.com:apache/geode-examples.git now 
checks out develop, making geode-examples consistent with all other geode- 
projects, and clearing the way to eliminate master from all projects if @Blake 
or anyone else would like to move forward on that.

Although it didn't seem to sway anyone else, let's still attempt to 
discuss/address @Anthony's concern that develop doesn't pair well with released 
versions of Geode.

Maybe the brew install + git clone workflow is not how we want to 
introduce an application developer to Geode.  We could suggest: if cloning from 
git, clone everything from git; if using released artifacts, use release 
artifacts for everything.  If mix-n-match is necessary, the README.md could 
explain how to check out the correct branch or tag (or specify the Geode 
version parameter in the gradle command) to match the installed version of 
Geode.

It's important and fundamental for application developers to be aware 
that client version must be <= server version, so perhaps it's beneficial to 
document that from the beginning.

Another idea might be improving the error message when the client 
version is too new?

We could also modify the release scripts to substitute the latest 
release version into the README as part of the release process, to keep the 
out-of-the-box experience as simple as copy-and-pasting a gradle command from 
the README.

@Anthony I'd be happy to pair with you on updating the README or any 
other scripts/documentation.

If anyone else has thoughts or ideas, please chime in.

On 7/14/20, 7:16 AM, "Anthony Baker"  wrote:

Consider the use case of an application developer who wants to run 
geode-examples against the latest geode release:

1) brew install apache-geode
2) git clone geode-examples
3) Get some runtime errors because geode-examples won’t connect to 
a previous geode release

At this point, you have to do some detective work to either 
download the geode-examples from the corresponding source release or switch 
over to the appropriate git tag.

I think there’s value in maintaining a default branch of 
geode-examples that tracks the latest release.

Anthony


> On Jul 9, 2020, at 9:39 PM, Owen Nichols  
wrote:
> 
> A fresh checkout of geode and all but one geode- 
repos checks out develop as the Default branch.
> 
> The lone exception is geode-examples.  Please vote +1 if you are 
in favor of changing its Default branch to develop for consistency with the 
other repos and other reasons as per recent discussion[1].
> 
> [1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Frfec15c0a7d5d6d57beed90868dbb53e3bfcaabca67589b28585556ee%40%253Cdev.geode.apache.org%253Edata=02%7C01%7Czhouxh%40vmware.com%7Cd42c339465e54b6ad5af08d8349b7bc4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637317189691026288sdata=iQD%2F3%2Bw4LXhrI6dczqML2vm%2Fc72YyKO80ueOu4elViI%3Dreserved=0






Re: [PROPOSAL] port GEODE-8385 changes to support/1.13

2020-07-31 Thread Xiaojian Zhou
+1

On 7/29/20, 2:03 PM, "Bruce Schuchardt"  wrote:

This has been merged to support/1.13.  Thank you all

On 7/29/20, 12:47 PM, "Owen Nichols"  wrote:

+1

On 7/29/20, 9:56 AM, "Dave Barnes"  wrote:

+1
Thanks, Bruce.

On Wed, Jul 29, 2020 at 8:22 AM Jianxia Chen  
wrote:

> +1
>
> On Wed, Jul 29, 2020 at 8:04 AM Bruce Schuchardt 

> wrote:
>
> > This concerns a hang during recovery from disk.  The problem was
> > introduced in 1.13.
> >
> > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8385data=02%7C01%7Czhouxh%40vmware.com%7C484ca36dad1742d54b6a08d83402e6a4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637316534352394217sdata=oEbjN1QwZVJQXuhStc%2FV9OabbRWO%2FG%2Fz0vtP731w%2Fbw%3Dreserved=0
> >
> >
>





Re: [PROPOSAL] Postpone Geode 1.14

2020-07-30 Thread Xiaojian Zhou
+1

On 7/29/20, 1:35 PM, "Mark Bretl"  wrote:

+1

Should we need to drop a line to user@geode or is communicating on this
list enough once decided?

--Mark

On Wed, Jul 29, 2020 at 7:05 AM Joris Melchior  wrote:

> +1
>
> On 2020-07-28, 7:34 PM, "Alexander Murmann"  wrote:
>
> Hi all,
>
> As mentioned on the previous discuss thread, I propose to hold off
> cutting
> 1.14 until we have shipped 1.13.
>
> Once we have shipped 1.13, we should discuss when we want to cut the
> 1.14
> release. The actual ship date for Geode 1.13 is important information
> for
> that conversation. Thus we cannot have that conversation before then.
>
>



Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped

2020-07-10 Thread Xiaojian Zhou
Hi, Alberto:

I was the original author who introduced the tmpDroppedEvents. Due to other 
work, I only got chance to read the issue on Thursday, which is your deadline. 
Can you hold on a little bit longer to Monday?

I have been thinking of history of the code changes and issues you encountered. 
I will try to find a light-weight solution with minimum impact to current code. 

Regards
Xiaojian Zhou

On 7/8/20, 1:05 PM, "Eric Shu"  wrote:

I think the only case the memory issue occurred is when all gateway senders 
are stopped in the wan-site. Otherwise another member would assume to be the 
primary queue. No more events will be enqueued in tmpDroppedEvents on the 
member with original primary queue. (For parallel wan queue, I do not think 
stop one gateway queue is a valid case to support.)

For all gateway senders are stopped case, no need to notify any other 
members in the wan site if the limit is reached. The tmpDroppedEvents is only 
used for remove events on the secondary queue. If no events are enqueued in the 
secondary queue, there is no need to add into tmpDroppedEvents at all. To me, 
it should be only used for limited events to be queued.

Regards,
Eric

From: Alberto Gomez 
Sent: Wednesday, July 8, 2020 12:02 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Thanks for your comments, Eric.

Limiting the size of the queue would be a simple solution but I think it 
would pose several problems on the the one configuring and operating Geode:

  *   How big should the queue be? Probably not easy to dimension. Should 
the limit by on the memory occupied by the elements or on the number of 
elements in the queue (in which case, depending on the size of the elements, 
the memory used could vary a lot)?
  *   What  to do when the limit has been reached? how do we notify that it 
was reached, what to do afterwards, how would we know what dropped events did 
not make it to the queue but should have been removed from the secondary's 
queue...

I think the solution proposed in the RFC is simple enough and also 
addresses a possible confusion with the semantics of the gateway sender stop 
command.
Stopping a gateway sender currently makes that all events received while 
the sender is stopped are dropped; but at the same time, unlimited memory may 
be consumed by the dropped events. We could put a limit on the amount of memory 
used by the queued dropped events but what would be the point in the first 
place to store them if those events will not be sent to the remote site anyway?
I would expect that after stopping a gateway sender no resources (or at 
least a minimal part) would be consumed by it. Otherwise we may as well not 
stop it or use the pause command depending on what we want to achieve.

From what I have seen, queuing dropped events has its place while the 
gateway sender is starting and while it is stopping but if it is done in a 
sender to be started manually or in a manually stopped server it could provoke 
an unexpected memory exhaustion.

I really think the solution proposed makes the behavior of the gateway 
sender command more logical.

Best regards,

Alberto

From: Eric Shu 
Sent: Wednesday, July 8, 2020 7:32 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

It seems that I was not able to comment on the RFC in the wiki yet.

Just try to find out if we have a simple solution for the issue you raised 
-- can we have a up-limit for the tmpDroppedEvents queue in question?

Always check the limit before adding to the queue -- so that the tmp queue 
is not unbound?

Regards,
Eric

From: Alberto Gomez 
Sent: Monday, July 6, 2020 8:24 AM
To: geode 
Subject: [DISCUSS] RFC - Avoid the queueing of dropped events by the 
primary gateway sender when the gateway sender is stopped

Hi,

I have published a new RFC in the Apache Geode wiki with the following 
title: "Avoid the queueing of dropped events by the primary gateway sender when 
the gateway sender is stopped".


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAvoid%2Bthe%2Bqueuing%2Bof%2Bdropped%2Bevents%2Bby%2Bthe%2Bprimary%2Bgateway%2Bsender%2Bwhen%2Bthe%2Bgateway%2Bsender%2Bis%2Bstoppeddata=02%7C01%7Czhouxh%40vmware.com%7C368e4337126e4e4ca05508d8237a31c6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298355008581269sdata=8R2rlTCFR1sVOk77ZDwjG5IVAnCBnuXHWTc2lluB2do%3Dreserved=0

Could you please give comments by Thursday, July 9th, 2020?

Thanks in advance,

Alberto G.



[PROPOSAL] merge GEODE-8259 to support branches

2020-06-30 Thread Xiaojian Zhou
Customer encountered a singlehop getAll failure due to SerializationException 
which is identified as socket error. The solution is to retry the getAll in 
this race condition (currently we did not retry).

The fix is tested in both develop and support branches. The fix is conservative 
and very low risk.

So it would be nice to bring to before 1.13.0 release.

Regards
Xiaojian Zhou



PROPOSAL to bring GEODE-8259 to support branches

2020-06-30 Thread Xiaojian Zhou
Customer encountered a singlehop getAll failure due to SerializationException 
which is identified as socket error. The solution is to retry the getAll in 
this race condition (currently we did not retry).

The fix is tested in both develop and support branches. The fix is conservative 
and very low risk.

So it would be nice to bring before 1.13.0 release.

Regards
Xiaojian Zhou




[PROPOSAL] merge GEODE-8259 to support branches

2020-06-30 Thread Xiaojian Zhou
Customer encountered a singlehop getAll failure due to
SerializationException which is identified as socket error. The solution is
to retry the getAll in this race condition (currently we did not retry).


The fix is tested in both develop and support branches. The fix is
conservative and very low risk.



So it would be nice to bring to before 1.13.0 release.



Regards

Xiaojian Zhou


Re: [PROPOSAL] Add windows jobs to PR checks

2020-06-26 Thread Xiaojian Zhou
What I am looking for is a script like following:

./regression -Z  deploy \
  -n  \
  -o  \
  -g  \
  -u  \
  -t  \
  -k  \
  -F  \
  

  Example:
./regression deploy -n 10 -o centos7 \
-g 
~/gemfire/closed/pivotalgf-assembly/build/distributions/pivotal-gemfire-regression-0.0.0.tgz
 \
-k ~/.ssh/id_rsa.pub -u johndoe -t storageteam myregression
```

Operating systems to choose from are:
* centos7
* rhel7
* ubuntu14*
* ubuntu16*
* sles12*
* sles11*
* windows
^ >

We use to have a
 script call "precheckin". I forgot if we can select operating system like 
above "regression" script. 

On 6/25/20, 4:09 PM, "Xiaojian Zhou"  wrote:

I vote to is also with current/existing process (not running for every PR).

We can create an on-request prechecking running on windows machine like 
what we did for running some regression tests, if someone really need to run it 
on windows (Actually, I'd love to have this tool)

On 6/25/20, 1:52 PM, "Anilkumar Gingade"  wrote:

Looking at the cost and value derived; My vote is with current/existing 
process (not running for every PR).

On 6/25/20, 11:39 AM, "Mark Hanson"  wrote:

I support adding it in, but I think the time wasted is less than 
you think. I think for me the most important thing is finding an issue when it 
is put in.

I think the current way is actually faster and more efficient, 
because every PR doesn’t have to wait the 4 hours and in reality the number is 
of windows failures is lower than the number of linux failures.

Just a thought.

Thanks,
Mark


> On Jun 25, 2020, at 11:30 AM, Jianxia Chen  
wrote:
> 
> +1 to add Windows tests to the PR pipeline. It may take longer 
time to run
> (up to 4 hours). But consider the time wasted on reverting, 
fixing and
> resubmitting, if there is a failure after merging to the develop 
branch. It
> is better to add the Windows tests to the PR pipeline. We can 
reevaluate
> and optimize the pipeline if the long running time is truly a 
concern.
> 
> On Thu, Jun 25, 2020 at 9:29 AM Kirk Lund  
wrote:
> 
>> I merged some new AcceptanceTests to develop after having my PR 
go GREEN.
>> But now these tests are failing in Windows.
>> 
>> I'd like to propose that we add the Windows jobs to our PR 
checks if we
>> plan to keep testing on Windows in CI.
>> 
>> Please vote or discuss.
>> 
>> Thanks,
>> Kirk
>> 






Re: [PROPOSAL] Add windows jobs to PR checks

2020-06-25 Thread Xiaojian Zhou
I vote to is also with current/existing process (not running for every PR).

We can create an on-request prechecking running on windows machine like what we 
did for running some regression tests, if someone really need to run it on 
windows (Actually, I'd love to have this tool)

On 6/25/20, 1:52 PM, "Anilkumar Gingade"  wrote:

Looking at the cost and value derived; My vote is with current/existing 
process (not running for every PR).

On 6/25/20, 11:39 AM, "Mark Hanson"  wrote:

I support adding it in, but I think the time wasted is less than you 
think. I think for me the most important thing is finding an issue when it is 
put in.

I think the current way is actually faster and more efficient, because 
every PR doesn’t have to wait the 4 hours and in reality the number is of 
windows failures is lower than the number of linux failures.

Just a thought.

Thanks,
Mark


> On Jun 25, 2020, at 11:30 AM, Jianxia Chen  wrote:
> 
> +1 to add Windows tests to the PR pipeline. It may take longer time 
to run
> (up to 4 hours). But consider the time wasted on reverting, fixing and
> resubmitting, if there is a failure after merging to the develop 
branch. It
> is better to add the Windows tests to the PR pipeline. We can 
reevaluate
> and optimize the pipeline if the long running time is truly a concern.
> 
> On Thu, Jun 25, 2020 at 9:29 AM Kirk Lund  wrote:
> 
>> I merged some new AcceptanceTests to develop after having my PR go 
GREEN.
>> But now these tests are failing in Windows.
>> 
>> I'd like to propose that we add the Windows jobs to our PR checks if 
we
>> plan to keep testing on Windows in CI.
>> 
>> Please vote or discuss.
>> 
>> Thanks,
>> Kirk
>> 





Re: [DISCUSSION] Stop using the Geode Repository for Feature/WIP Branches

2020-06-03 Thread Xiaojian Zhou
We have discussed that when in Common team. The current solution worked 
perfectly. 

One person will merge the develop into feature/GEODE-7665 (which conceptually 
can be anyone. I did 2 times) every week. Now Naba is taking the responsibility 
to do the weekly merge. He did great!

Fork will cause many other issues, it will still need a person to maintain it. 
I feel fork is only suitable for a work that will be finished within a week. 

Regards
Gester

On 6/2/20, 4:41 PM, "Nabarun Nag"  wrote:

I don’t think it is right to make the open source Geode Community to work 
on my personal fork 

Regards
Naba


-Original Message-
From: Mark Hanson  
Sent: Tuesday, June 2, 2020 4:35 PM
To: dev@geode.apache.org
Subject: Re: [DISCUSSION] Stop using the Geode Repository for Feature/WIP 
Branches

While I am not 100% sure, I understand your thoughts here, I am pretty sure 
I do. We have already done such work in a branch in a fork (Micrometer work). 
The only real gotcha was that there needed to be one person at least as a 
collaborator, in case of vacations and such. 

All of the things you have specified are possible within the confines of a 
fork.

Thanks,
Mark

On 6/2/20, 4:29 PM, "Nabarun Nag"  wrote:

- We are maintaining feature/GEODE-7665 which is the feature branch for 
PR clear work on which multiple developers are working on. 
- We are maintaining this in Geode repository.
- All sub-tasks of GEODE-7665 are merged into this feature branch.
- Anyone in the Geode community can work on any subtask 
- This is a long running, and a massive feature development which is 
manipulating core code on Apache Geode. Hence all work is pushed to the feature 
branch to keep develop isolated from a regression introduced in PR clear work.
- We have previously used release flags for Lucene work which we found 
to be inefficient and unnecessary extra work.

We vote that PR clear feature branch be maintained in the Geode 
Repository as this is a long running, massive effort involving everyone from 
the community.

When the PR clear tasks are completed, the branch will be rigorously 
tested and then squash merged into develop and the feature branch will be 
deleted.


Regards
Naba

-Original Message-
From: Jacob Barrett  
Sent: Tuesday, June 2, 2020 3:43 PM
To: dev@geode.apache.org
Subject: [DISCUSSION] Stop using the Geode Repository for Feature/WIP 
Branches

I know this has been brought up multiple times without resolution. I 
want us resolve to ban the use of Geode repository for work in progress, 
feature branches, or any other branches that are not release or support 
branches. There is no reason given the nature of GitHub why you can’t fork the 
repository to contribute.  

* Work done on these branches results in the ASF bots updating the 
associated JIRAs and email blasting all of us with your work. 

* People don’t clean up these branches, which leads to a mess of 
branches on everyones clones and in the UI.

* All your intermediate commits get synced to the repo, which bloats 
the repo for everyone else. Even your commits you rebase over and force push 
are left in the repo. When you delete your branch these commits are not 
removed. There is no way for us to prune unreferenced commits. Nobody else 
needs your commits outside of what was merged to a production branch.

If anyone has a use case for working directly from Geode repo that 
can’t work from a fork please post it here so we can resolve. 

Thanks,
Jake







Re: [PROPOSAL] include GEODE-8073 in Geode 1.13 support branch

2020-05-06 Thread Xiaojian Zhou
+1
This bug reproduced again in today's regression. It's better to backport to
1.13.

On Wed, May 6, 2020 at 11:42 AM Jinmei Liao  wrote:

> +1
>
> On Wed, May 6, 2020 at 11:40 AM Owen Nichols  wrote:
>
> > +1 to fix this NPE on support/1.13 and also support/1.12
> >
> > > On May 6, 2020, at 11:19 AM, Eric Shu  wrote:
> > >
> > > GEODE-8073
> >
> >
>
> --
> Cheers
>
> Jinmei
>


Re: Creation of buckets for partitioned region

2020-02-14 Thread Xiaojian Zhou
But these servers will be assigned buckets later by rebalance.

On Fri, Feb 14, 2020 at 9:25 AM Barry Oglesby  wrote:

> Mario,
>
> Yes, a query execution causes the buckets to be created.
>
> Also, onRegion function execution causes them to be created as well.
>
> There is an API to create the buckets for a region called
> PartitionRegionHelper.assignBucketsToPartitions
>
> Be careful about when that method is called, though. Any servers that are
> started after it is called will contain no buckets.
>
> Thanks,
> Barry Oglesby
>
>
>
> On Fri, Feb 14, 2020 at 7:48 AM Udo Kohlmeyer  wrote:
>
> > Hi there Mario,
> >
> > I can confirm the first observation. Buckets are created lazily. Upon
> > data create, buckets are created as required.
> >
> > --Udo
> >
> > On 2/14/20 12:16 AM, Mario Ivanac wrote:
> > > Hi geode dev,
> > >
> > > we have observed following behavior, at creation of partitioned
> regions.
> > >
> > > After partitioned region is created, initialization of bucket will take
> > place:
> > >
> > >
> > >*   only at point when first data is inserted in region (bucket will
> > be incrementally created for every added entry, till [max buckets]),
> > >*   or "select *" query is performed against that partitioned region
> > (in this case all buckets [max buckets] are created at once).
> > >
> > > Can you confirm that this is expected behavior?
> > >
> > > Thanks,
> > > Mario
> > >
> >
>


Re: RFC - Logging to Standard Out

2020-01-08 Thread Xiaojian Zhou
+1

On Wed, Jan 8, 2020 at 1:56 PM Jason Huynh  wrote:

> +1
>
> On Wed, Jan 8, 2020 at 1:21 PM Dan Smith  wrote:
>
> > +1. Looks good!
> >
> > -Dan
> >
> > On Wed, Jan 8, 2020 at 12:56 PM Blake Bender  wrote:
> >
> > > +1 - this is also a todo item for the native client, I think.  NC has a
> > bug
> > > in logging which is in my top 3 for "most irritating," as well, which
> is
> > > that logging actually starts *before* the logging system is
> initialized,
> > so
> > > even if you *do* configure a log file, if something happens in NC prior
> > to
> > > that it gets logged to stdout.  Similarly, logging can continue in NC
> > > *after* the logger is shut down, and any logging after that also goes
> to
> > > stdout.  I'm a big fan of making everything consistent, and this seems
> as
> > > good a way as any.
> > >
> > > Just FWIW, using the character '-' anywhere in a log file name for NC
> > will
> > > currently cause a segfault, so this will force us to fix that problem
> as
> > > well.
> > >
> > >
> > > On Wed, Jan 8, 2020 at 12:39 PM Jacob Barrett 
> > wrote:
> > >
> > > > Please see RFC for Logging to Standard Out.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/GEODE/Logging+to+Standard+Out
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/GEODE/Logging+to+Standard+Out
> > > > >
> > > >
> > > > Please comment by 1/21/2020.
> > > >
> > > > Thanks,
> > > > Jake
> > > >
> > > >
> > >
> >
>


RFC is about to finish collecting feedback: Support for clear operation on partitioned region

2019-12-31 Thread Xiaojian Zhou
Hi,

We have published the RFC: "Support for clear operation on partitioned
region" for about 2 weeks. Thank the community to have given a lot of
valuable feedback.

https://cwiki.apache.org/confluence/display/GEODE/Support+for+clear+operation+on+partitioned+region

We have updated the RFC accordingly and answered all the questions.

The period of collecting feedback will finish by the end of this week.
Should you have questions and concerns, please add to the RFC within this
week.

Thank you.

Regards
Xiaojian Zhou (Gester)


[DISCUSS] Support for clear operation on partitioned region

2019-12-18 Thread Xiaojian Zhou
Hi,

We wrote up a proposal for clear operation on Partitioned Region.
Please review and comment on the below proposal.

https://cwiki.apache.org/confluence/display/GEODE/Support+for+clear+operation+on+partitioned+region

Customers have been looking for this feature for a long time. So far we
only support clear on Distributed Region. But most of the customers are
using Partitioned Region nowadays.

Please put your comments in RFC directly instead of on the email thread.
Thank you.

Regards
Xiaojian Zhou (Gester)


Re: defunct branches

2019-11-27 Thread Xiaojian Zhou
Yes, I cannot find GEODE-3967 either.

%103 ~/git12/geode > git br -r | grep 3967

%103 ~/git12/geode >

On Thu, Apr 18, 2019 at 9:28 AM Patrick Rhomberg 
wrote:

> To elaborate on what Dan said:
>
> What has happened is that your local record of the remote references has
> 400+ remote branch references.  Some time ago, I raised the same concern
> that you have here, and we got that number down to a couple dozen.  But
> your local references are still there.
>
> git fetch origin will update your local references to your remotes,
> and the --prune
> option on that command will remove any local references that are no longer
> on the remote.
>
> For what it's worth, we do seem to have a handful of branches that are
> still getting pushed to origin for the sake of PRs, etc.  While remembering
> to delete your branches after the PR gets merged is well and good, I'd take
> it one step farther and encourage everyone to not sully the common space
> with their work-in-progress or PR branches.  That's exactly what a fork is
> for.  And then we don't get into this case of accruing extra references on
> origin in the first place.
>
> Imagination is Change.
> ~Patrick
>
> On Thu, Apr 18, 2019 at 8:46 AM Dan Smith  wrote:
>
> > You just need to do git remote prune origin. Git doesn't remove remote
> > branches from your local copy automatically.
> >
> > -Dan
> >
> > On Thu, Apr 18, 2019 at 8:08 AM Bruce Schuchardt  >
> > wrote:
> >
> > > Sorry to spam everyone.  "git branch -r" seems to be a local thing.  I
> > > made a fresh clone of the apache repo and now only see the branches
> > > mentioned in the UI.
> > >
> > > On 4/17/19 5:06 PM, Jason Huynh wrote:
> > > > Hi Bruce,
> > > >
> > > > I am unable to see the same branches on geode repo.  I do see these
> > > > branches on my personal fork but that's because I haven't updated my
> > own
> > > > personal fork in some time...
> > > >
> > > > Is there a chance that your origin is pointing to your personal fork
> > and
> > > > not the Apache Geode Repo?
> > > >
> > > > I am also unable to see these branches through the ui:
> > > > https://github.com/apache/geode/branches/all
> > > >
> > > >
> > > >
> > > > On Wed, Apr 17, 2019 at 4:17 PM Bruce Schuchardt <
> > bschucha...@pivotal.io
> > > >
> > > > wrote:
> > > >
> > > >> We have nearly 400 branches in the repo right now.  Most of them are
> > for
> > > >> efforts that have been merged to develop long ago.  Don't forget to
> > > >> delete your branches when you're done with them.
> > > >>
> > > >>
> > > >>
> > >
> >
>


Re: [DISCUSS/VOTE] Proposal to bring GEODE-7465 to release/1.11.0

2019-11-26 Thread Xiaojian Zhou
+1

On Tue, Nov 26, 2019 at 12:48 PM Joris Melchior 
wrote:

> +1
>
> On Tue, Nov 26, 2019 at 2:41 PM Jason Huynh  wrote:
>
> > +1
> >
> > On Tue, Nov 26, 2019 at 11:34 AM Anilkumar Gingade 
> > wrote:
> >
> > > +1
> > >
> > > On Tue, Nov 26, 2019 at 11:32 AM Udo Kohlmeyer  wrote:
> > >
> > > > This is no-brainer
> > > >
> > > > *+1*
> > > >
> > > > On 11/26/19 11:27 AM, Owen Nichols wrote:
> > > > > I would like to propose bringing “GEODE-7465: Set eventProcessor to
> > > null
> > > > in serial AEQ when it is stopped” into the 1.11 release
> (necessitating
> > an
> > > > RC4).
> > > > >
> > > > > Without the fix, a sequence of ordinary gfsh commands will leave
> the
> > > WAN
> > > > gateway in an unrecoverable hung state:
> > > > > stop gateway-sender
> > > > > start gateway-sender
> > > > > The only recourse is to restart the server.
> > > > >
> > > > > This fix is critical because the distributed system fails to sync
> > data
> > > > between WAN sites as the user would expect.
> > > > > This issue did exist in previous releases, but recent enhancements
> to
> > > > WAN/AEQ such as AEQ-pause are increasing user interaction with
> > > WAN-related
> > > > gfsh commands.
> > > > >
> > > > > The fix is simple, low risk, tested, and has been on develop for 5
> > > days:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/geode/commit/e148cef9cb63eba283cf86bc490eb280023567ce
> > > >
> > >
> >
>
>
> --
> *Joris Melchior *
> CF Engineering
> Pivotal Toronto
> 416 877 5427
>
> “Programs must be written for people to read, and only incidentally for
> machines to execute.” – *Hal Abelson*
> 
>


Re: WAN Get-Initial-Image

2019-11-25 Thread Xiaojian Zhou
DId you run "create async-event-queue"?

On Mon, Nov 25, 2019 at 9:23 AM anjana_nair  wrote:

> Hi,
>
> We are trying to solve cloud replication  using
> asyncEventListeners.However
> the sample AsyncEventListener is not  getting fired when I try a put. Could
> you please look into this. Commands tried out are below.
>
> I am starting to write   an AsyncEventListener  to store messages to
> Cassandra.However Listener is not getting fired. Following are the commands
> tried from command line.
>
> 1.gfsh>deploy --jars=test-2019.11.17.jar
>
> Deploying files: test-2019.11.17.jar
> Total file size is: 0.00MB
>
> Continue?  (Y/n): y
> Member  |Deployed JAR | Deployed JAR Location
> --- | --- |
> 
> server1 | test-2019.11.17.jar |
> H:\GemFire_Server\server1\test-2019.11.17.v1.jar
>
> 2.gfsh>create region --name=sample1 --type=REPLICATE
> --async-event-queue-id=sampleq
> Member  | Status
> --- | --
> server1 | Region "/sample1" created on "server1"
>
> 3.gfsh>create region --name=sample1 --type=REPLICATE
> --async-event-queue-id=sample
> Member  | Status
> --- | --
> server1 | Region "/sample1" created on "server1"
>
> gfsh>put --key=('123') --value=('ABC') --region=sample1
> Result  : true
> Key Class   : java.lang.String
> Key : ('123')
> Value Class : java.lang.String
> Old Value   : 
>
>
> gfsh>put --key=('123') --value=('ABC1') --region=sample1
> Result  : true
> Key Class   : java.lang.String
> Key : ('123')
> Value Class : java.lang.String
> Old Value   : ('ABC')
>
>
> However I see that Listener is not fired.
>
> What could be wrong ? My Listener is very simple with some simple print
> statements.
>
>
>
>
> --
> Sent from:
> http://apache-geode-incubating-developers-forum.70738.x6.nabble.com/
>


Re: Odg: gateway sender queue

2019-11-14 Thread Xiaojian Zhou
The --cleanQueue option is a similar idea as Barry's "DeadLetter" spike. I
remembered that we decided not to do it.


On Wed, Nov 13, 2019 at 11:41 PM Mario Ivanac  wrote:

> Hi,
>
> just to remind you on last question:
>
> what is your opinion on adding additional option in gfsh command  "start
> gateway sender"
> to control clearing of existing queues --cleanQueues.
>
> This option will indicate, when gateway sender is started, should we
> discard/clean existing queue, or should we use existing queue.
> By default it will be to discard/clean existing queue.
>
> Best Regards,
> Mario
> 
> Šalje: Mario Ivanac 
> Poslano: 8. studenog 2019. 13:00
> Prima: dev@geode.apache.org 
> Predmet: Odg: gateway sender queue
>
> Hi all,
>
> one more clarification regarding 3rd question:
>
> "*   Could we add extra option in gfsh command  "start gateway sender"
>  that allows to control queues reset (for instance --cleanQueues)"
>
> This option will indicate, when gateway sender is started, should we
> discard/clean existing queue, or should we use existing queue.
> By default it will be to discard/clean existing queue.
>
> Best Regards,
> Mario
> 
> Šalje: Mario Ivanac 
> Poslano: 7. studenog 2019. 9:01
> Prima: Dan Smith ; dev@geode.apache.org <
> dev@geode.apache.org>
> Predmet: Odg: gateway sender queue
>
> Hi,
>
> thanks for answers.
>
> Some more details regarding 1st question.
>
> Is this behavior same (for serial and parallel gateway sender) in case
> queue is persistent?
> Meaning, should queue (persistent) be purged if we restart gateway sender?
>
>
> Thanks,
> Mario
>
> 
> Šalje: Dan Smith 
> Poslano: 5. studenog 2019. 18:52
> Prima: dev@geode.apache.org 
> Predmet: Re: gateway sender queue
>
> Some replies, inline:
>
>   *   During testing we have observed, different behavior in parallel and
> > serial gateway senders. In case we manually stop, than start gateway
> > senders, for parallel gateway senders, queue is purged, but for serial
> > gateway senders this is not the case. Is this normal behavior or bug?
> >
>
> Hmm, I also think stop is supposed to clear the queue. I think if you are
> seeing that it doesn't clear the queue, that might be a bug.
>
>
>
> >   *   What happens with the queues when whole cluster is stopped and
> later
> > started (In our tests with persistent queues, the events are kept)?
> >
>
> Persistent queues will keep all of the events when you restart.
>
>
> >   *   Could we add extra option in gfsh command  "start gateway sender"
> > that allows to control queues reset (for instance --cleanQueues)?
> >
>
> If stop does clear the queue, would this be needed? It might still be
> reasonable - I've heard folks request a way to clear running queues as
> well.
>
> -Dan
>


Re: Lucene upgrade

2019-11-07 Thread Xiaojian Zhou
Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.

On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:

> Gester, I don't think we need to write in the old format, we just need the
> new format not to be written while old members can potentially read the
> lucene files.  Option 1 can be very similar to Dan's snippet of code.
>
> I think Option 2 is going to leave a lot of people unhappy when they get
> stuck with what Mario is experiencing right now and all we can say is "you
> should have read the doc". Not to say Option 2 isn't valid and it's
> definitely the least amount of work to do, I still vote option 1.
>
> On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou  wrote:
>
> > Usually re-creating region and index are expensive and customers are
> > reluctant to do it, according to my memory.
> >
> > We do have an offline reindex scripts or steps (written by Barry?). If
> that
> > could be an option, they can try that offline tool.
> >
> > I saw from Mario's email, he said: "I didn't found a way to write lucene
> in
> > older format. They only support
> > reading old format indexes with newer version by using lucene-backward-
> > codec."
> >
> > That's why I think option-1 is not feasible.
> >
> > Option-2 will cause the queue to be filled. But usually customer will
> hold
> > on, silence or reduce their business throughput when
> > doing rolling upgrade. I wonder if it's a reasonable assumption.
> >
> > Overall, after compared all the 3 options, I still think option-2 is the
> > best bet.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett 
> wrote:
> >
> > >
> > >
> > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh  wrote:
> > > >
> > > > Jake - there is a side effect to this in that the user would have to
> > > > reimport all their data into the user defined region too.  Client
> apps
> > > > would also have to know which of the regions to put into.. also, I
> may
> > be
> > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > support
> > > > whoever implements the changes :-P
> > >
> > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> thought.
> > >
> > > -Jake
> > >
> > >
> >
>


Re: Lucene upgrade

2019-11-06 Thread Xiaojian Zhou
Usually re-creating region and index are expensive and customers are
reluctant to do it, according to my memory.

We do have an offline reindex scripts or steps (written by Barry?). If that
could be an option, they can try that offline tool.

I saw from Mario's email, he said: "I didn't found a way to write lucene in
older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec."

That's why I think option-1 is not feasible.

Option-2 will cause the queue to be filled. But usually customer will hold
on, silence or reduce their business throughput when
doing rolling upgrade. I wonder if it's a reasonable assumption.

Overall, after compared all the 3 options, I still think option-2 is the
best bet.

Regards
Gester


On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett  wrote:

>
>
> > On Nov 6, 2019, at 3:36 PM, Jason Huynh  wrote:
> >
> > Jake - there is a side effect to this in that the user would have to
> > reimport all their data into the user defined region too.  Client apps
> > would also have to know which of the regions to put into.. also, I may be
> > misunderstanding this suggestion, completely.  In either case, I'll
> support
> > whoever implements the changes :-P
>
> Ah… there isn’t a way to re-index the existing data. Eh… just a thought.
>
> -Jake
>
>


Re: Lucene upgrade

2019-11-06 Thread Xiaojian Zhou
He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some
challenges. One challenge is the codec changed, which caused the format of
index is also changed.

That's why we did not implement it.

If he resolved the coding challenges, then rolling upgrade will probably
need option-2 to workaround it.

Regards
Gester


On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett  wrote:

> What about “versioning” the region that backs the indexes? Old servers
> with old license would continue to read/write to old region. New servers
> would start re-indexing with the new version. Given the async nature of the
> indexing would the mismatch in indexing for some period of time have an
> impact?
>
> Not an ideal solution but it’s something.
>
> In my previous life we just deleted the indexes and rebuilt them on
> upgrade but that was specific to our application.
>
> -Jake
>
>
> > On Nov 6, 2019, at 11:18 AM, Jason Huynh  wrote:
> >
> > Hi Mario,
> >
> > I think there are a few ways to accomplish what Dan was suggesting...Dan
> or
> > other's, please chime in with more options/solutions.
> >
> > 1.) We add some product code/lucene listener to detect whether we have
> old
> > versions of geode and if so, do not write to lucene on the newly updated
> > node until all versions are up to date.
> >
> > 2.)  We document it and provide instructions (and a way) to pause lucene
> > indexing before someone attempts to do a rolling upgrade.
> >
> > I'd prefer option 1 or some other robust solution, because I think
> option 2
> > has many possible issues.
> >
> >
> > -Jason
> >
> >
> >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo  wrote:
> >>
> >> Hi Dan,
> >>
> >> thanks for suggestions.
> >> I didn't found a way to write lucene in older format. They only support
> >> reading old format indexes with newer version by using lucene-backward-
> >> codec.
> >>
> >> Regarding to freeze writes to the lucene index, that means that we need
> >> to start locators and servers, create lucene index on the server, roll
> >> it to current and then do puts. In this case tests passed. Is it ok?
> >>
> >>
> >> BR,
> >> Mario
> >>
> >>
> >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> >>> I think the issue probably has to do with doing a rolling upgrade
> >>> from an
> >>> old version of geode (with an old version of lucene) to the new
> >>> version of
> >>> geode.
> >>>
> >>> Geode's lucene integration works by writing the lucene index to a
> >>> colocated
> >>> region. So lucene index data that was generated on one server can be
> >>> replicated or rebalanced to other servers.
> >>>
> >>> I think what may be happening is that data written by a geode member
> >>> with a
> >>> newer version is being read by a geode member with an old version.
> >>> Because
> >>> this is a rolling upgrade test, members with multiple versions will
> >>> be
> >>> running as part of the same cluster.
> >>>
> >>> I think to really fix this rolling upgrade issue we would need to
> >>> somehow
> >>> configure the new version of lucene to write data in the old format,
> >>> at
> >>> least until the rolling upgrade is complete. I'm not sure if that is
> >>> possible with lucene or not - but perhaps? Another option might be to
> >>> freeze writes to the lucene index during the rolling upgrade process.
> >>> Lucene indexes are asynchronous, so this wouldn't necessarily require
> >>> blocking all puts. But it would require queueing up a lot of updates.
> >>>
> >>> -Dan
> >>>
> >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
> >>> wrote:
> >>>
>  Hi geode dev,
> 
>  I'm working on upgrade lucene to a newer version. (
>  https://issues.apache.org/jira/browse/GEODE-7309)
> 
>  I followed instruction from
> 
> >>
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
>  Also add some other changes that is needed for lucene 8.2.0.
> 
>  I found some problems with tests:
>  * geode-
>    lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
>  ribu
>    ted/DistributedScoringJUnitTest.java:
> 
> 
>  *
>  geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
>  ava:
>  *
>  geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
>  ed.java:
>  *
>  ./geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
>  n.java:
>  *
>  ./geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
>  itionRegion.java:
> 
>   -> failed due to
>  Caused by: org.apache.lucene.index.IndexFormatTooOldException:
>  Format
>  

Re: [vote/discuss]Override stressNewTest for Pull Request #4250?

2019-10-31 Thread Xiaojian Zhou
It finished after 4 hour 51 minutes. It looks like we do need to increase
the timeout for stressNewTest.

On Thu, Oct 31, 2019 at 4:45 PM Darrel Schneider 
wrote:

> +1
>
> On Thu, Oct 31, 2019 at 4:16 PM Jinmei Liao  wrote:
>
> > +1
> >
> > On Thu, Oct 31, 2019, 3:30 PM Xiaojian Zhou  wrote:
> >
> > > I'm curious to see the new stressNew test result too.
> > >
> > > On Thu, Oct 31, 2019 at 3:26 PM Owen Nichols 
> > wrote:
> > >
> > > > I’ve retriggered StressNew <
> > > >
> > >
> >
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/4758
> > > >
> > > > with a temporarily-increased timeout of 12 hours so we can see how
> long
> > > it
> > > > would actually take, to have some data point whether to propose a
> > > permanent
> > > > timeout increase or whether breaking up into multiple PRs is should
> be
> > > the
> > > > standard way to get around this.
> > > >
> > > > > On Oct 31, 2019, at 2:52 PM, Donal Evans 
> wrote:
> > > > >
> > > > > +1 to allowing this PR to be merged, although I'd lean strongly
> > toward
> > > > > facilitating this by temporarily increasing the timeout on the job
> to
> > > > allow
> > > > > it to actually pass rather than a manual override of the
> > StressNewTest.
> > > > >
> > > > > The fact that it's passed over 7000 times without failing is pretty
> > > > strong
> > > > > evidence that it's not a flaky test, which is what StressNewTest is
> > > > > supposed to catch, so there doesn't seem to be any risk associated
> > with
> > > > > circumventing it in this case, but if there's a feasible solution
> > that
> > > > > doesn't involve "cheating" or ignoring the test job, then that
> would
> > be
> > > > > preferable.
> > > > >
> > > > > - Donal
> > > > >
> > > > > On Thu, Oct 31, 2019 at 2:04 PM Jason Huynh 
> > wrote:
> > > > >
> > > > >> Greetings,
> > > > >>
> > > > >> We have a pull request (https://github.com/apache/geode/pull/4250
> )
> > > > that is
> > > > >> running into a problem with stressNewTest.  Mostly the tests that
> > are
> > > > being
> > > > >> run are RollingUpgrade tests that take quite a bit of time to run
> > the
> > > > full
> > > > >> suite.  Because these tests are added/modified, the stressNewTest
> > > > doesn't
> > > > >> have enough time to complete the run because it runs them N(50)
> > number
> > > > of
> > > > >> times.
> > > > >>
> > > > >> However what has completed is 7400 tests and none of them have
> > failed:
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4250/test-results/repeatTest/1572546653/
> > > > >>
> > > > >> We would like to get this fix in before branching the next
> release,
> > > but
> > > > are
> > > > >> unable to due to stressNewTest gating the merge button.  I know we
> > > have
> > > > >> another thread about overrides etc, and maybe this is a data
> point,
> > > but
> > > > >> this isn't meant to discuss that.
> > > > >>
> > > > >> Would everyone be able to agree to allow someone to manually
> > override
> > > > and
> > > > >> merge this commit in (title of PR and reviews pending)?
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: [vote/discuss]Override stressNewTest for Pull Request #4250?

2019-10-31 Thread Xiaojian Zhou
I'm curious to see the new stressNew test result too.

On Thu, Oct 31, 2019 at 3:26 PM Owen Nichols  wrote:

> I’ve retriggered StressNew <
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/4758>
> with a temporarily-increased timeout of 12 hours so we can see how long it
> would actually take, to have some data point whether to propose a permanent
> timeout increase or whether breaking up into multiple PRs is should be the
> standard way to get around this.
>
> > On Oct 31, 2019, at 2:52 PM, Donal Evans  wrote:
> >
> > +1 to allowing this PR to be merged, although I'd lean strongly toward
> > facilitating this by temporarily increasing the timeout on the job to
> allow
> > it to actually pass rather than a manual override of the StressNewTest.
> >
> > The fact that it's passed over 7000 times without failing is pretty
> strong
> > evidence that it's not a flaky test, which is what StressNewTest is
> > supposed to catch, so there doesn't seem to be any risk associated with
> > circumventing it in this case, but if there's a feasible solution that
> > doesn't involve "cheating" or ignoring the test job, then that would be
> > preferable.
> >
> > - Donal
> >
> > On Thu, Oct 31, 2019 at 2:04 PM Jason Huynh  wrote:
> >
> >> Greetings,
> >>
> >> We have a pull request (https://github.com/apache/geode/pull/4250)
> that is
> >> running into a problem with stressNewTest.  Mostly the tests that are
> being
> >> run are RollingUpgrade tests that take quite a bit of time to run the
> full
> >> suite.  Because these tests are added/modified, the stressNewTest
> doesn't
> >> have enough time to complete the run because it runs them N(50) number
> of
> >> times.
> >>
> >> However what has completed is 7400 tests and none of them have failed:
> >>
> >>
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4250/test-results/repeatTest/1572546653/
> >>
> >> We would like to get this fix in before branching the next release, but
> are
> >> unable to due to stressNewTest gating the merge button.  I know we have
> >> another thread about overrides etc, and maybe this is a data point, but
> >> this isn't meant to discuss that.
> >>
> >> Would everyone be able to agree to allow someone to manually override
> and
> >> merge this commit in (title of PR and reviews pending)?
> >>
>
>


Re: [DISCUSS] log4j errors/warnings

2019-10-22 Thread Xiaojian Zhou
In CI, I keep hitting "> Task :geode-assembly:defaultCacheConfig

09:13:37 
<https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/DistributedTestOpenJDK11/builds/4572#L5d9757af:547>
ERROR StatusLogger Log4j2 could not find a logging implementation.
Please add log4j-core to the classpath. Using SimpleLogger to log to
the console..."


On Tue, Oct 22, 2019 at 11:31 AM John Blum  wrote:

> There are other ways of controlling the Log4j2 Status Logger other than
> adding test dependencies.
>
>
> For instance, you can:
>
> 1. Set the JVM System property
> org.apache.logging.log4j.simplelog.StatusLogger.level to "OFF".
>
> 2. Theoretically, when Lo4j2 finds a log4j2 or log4j2-test Properties,
> YAML, JSON or XML file on the classpath, it should honor the following
> configuration setting, e.g. in log4j2.xml:
>
> 
>
> This is described in the Log4j documentation at
> https://logging.apache.org/log4j/2.x/manual/configuration.html in
> section "*Status
> Messages*".
>
> Also see the section "*Automatic Configuration*" for more details on how
> Log4j2 resolves configuration metadata (e.g. log4j2.xml).
>
> 3. There are also programmatical ways to control status logging by
> acquiring the StatusLogger and removing all StatusListeners prior to the
> Log4j2 logging system being initialized, or alternatively setting a no-op
> StatusListener implementation, which you would need to implement yourself
> since, seemingly, *Log4j2* does not provide an implementation unlike
> *Logback*. (e.g. [1])
>
> StatusLogger.getLogger().getListeners().forEach(StatusLogger.getLogger
> ()::removeListener);
>
>
> Quickly experimenting, the only approach I got working in my *Spring Boot*
> application using Apache Geode was #1.  I suspect there was other things
> running interference, but I did not investigate further.
>
> Anyway, I would error on the side of caution and use 1 of the approaches
> above rather than simply throwing in another dependency, testRuntime or
> otherwise.  It is too easy for that to be inadvertently and incorrectly
> changed by some maintainer later on.
>
> $0.02
>
> -j
>
>
> [1]
>
> https://github.com/spring-projects/spring-boot-data-geode/blob/master/spring-geode-docs/src/main/resources/logback.xml#L4
>
>
> On Tue, Oct 22, 2019 at 9:57 AM Xiaojian Zhou  wrote:
>
> > I hit this problem in PR. I am just curious why it did not happen before?
> >
> >
> > On Tue, Oct 22, 2019 at 9:44 AM Kirk Lund  wrote:
> >
> > > I'm ok with adding log4j-core to the testRuntime for all unit test
> > targets
> > > to prevent the ERROR message. Any other input?
> > >
> > > On Fri, Oct 18, 2019 at 3:10 PM John Blum  wrote:
> > >
> > > > Be careful to only add logging dependencies as testRuntime
> > dependencies.
> > > > Do not add any logger implementation/provider (e.g. log4j-core, or
> > > > otherwise) in either the compile-time or runtime scope.
> > > >
> > > > This also means that when users are using and running Apache Geode
> > > > applications (regardless of context), they will need to explicitly
> > choose
> > > > and declare a logging implementation, otherwise they will see the
> same
> > > > ERROR message logged.  For example, when using Spring Boot, users
> > > > would declare a runtime dependency on
> > > > org.springframework.boot:spring-boot-starter-logging.  This uses
> > Logback
> > > as
> > > > the logging provider and adapts Log4j with SLF4J using the bridge.
> > > >
> > > > To make matters worse, unfortunately, this message is logged by the
> > > logging
> > > > facade as an error when it should rather be logged as WARN instead,
> or
> > > > arguably less.
> > > >
> > > > Technically, you should also be able to quiet down the "internal"
> > Logging
> > > > facade messaging using a no-op status listener, e.g. ...
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/spring-projects/spring-boot-data-geode/blob/master/spring-geode-tests/smoke-tests/spring-initializer/src/test/resources/logback.xml#L4
> > > >
> > > > I not sure what that is for Log4j2 (but there should be an
> equivalent).
> > > >
> > > >
> > > >
> > > > On Fri, Oct 18, 2019 at 1:26 PM Bruce Schuchardt <
> > bschucha...@pivotal.io
> > > >
> > > > wrote:
> > > >
> > > > > Not long ago

Re: [DISCUSS] log4j errors/warnings

2019-10-22 Thread Xiaojian Zhou
I hit this problem in PR. I am just curious why it did not happen before?


On Tue, Oct 22, 2019 at 9:44 AM Kirk Lund  wrote:

> I'm ok with adding log4j-core to the testRuntime for all unit test targets
> to prevent the ERROR message. Any other input?
>
> On Fri, Oct 18, 2019 at 3:10 PM John Blum  wrote:
>
> > Be careful to only add logging dependencies as testRuntime dependencies.
> > Do not add any logger implementation/provider (e.g. log4j-core, or
> > otherwise) in either the compile-time or runtime scope.
> >
> > This also means that when users are using and running Apache Geode
> > applications (regardless of context), they will need to explicitly choose
> > and declare a logging implementation, otherwise they will see the same
> > ERROR message logged.  For example, when using Spring Boot, users
> > would declare a runtime dependency on
> > org.springframework.boot:spring-boot-starter-logging.  This uses Logback
> as
> > the logging provider and adapts Log4j with SLF4J using the bridge.
> >
> > To make matters worse, unfortunately, this message is logged by the
> logging
> > facade as an error when it should rather be logged as WARN instead, or
> > arguably less.
> >
> > Technically, you should also be able to quiet down the "internal" Logging
> > facade messaging using a no-op status listener, e.g. ...
> >
> >
> >
> https://github.com/spring-projects/spring-boot-data-geode/blob/master/spring-geode-tests/smoke-tests/spring-initializer/src/test/resources/logback.xml#L4
> >
> > I not sure what that is for Log4j2 (but there should be an equivalent).
> >
> >
> >
> > On Fri, Oct 18, 2019 at 1:26 PM Bruce Schuchardt  >
> > wrote:
> >
> > > Not long ago changes were made to the sub-projects that introduced a
> lot
> > > of build noise.  In gradle builds we see a lot of this:
> > >
> > > ERROR StatusLogger Log4j2 could not find a logging implementation.
> Please
> > > add log4j-core to the classpath. Using SimpleLogger to log to the
> > console...
> > >
> > > and in IntelliJ unit test runs we get this:
> > >
> > > ERROR StatusLogger No Log4j 2 configuration file found. Using default
> > > configuration (logging only errors to the console), or user
> > > programmatically provided configurations. Set system property
> > > 'log4j2.debug' to show Log4j 2 internal initialization logging.
> > Seehttps://
> > > logging.apache.org/log4j/2.x/manual/configuration.html  for
> instructions
> > > on how to configure Log4j 2
> > >
> > > That's really annoying and it looks like Geode is broken.  To fix this
> > > it was suggested that "we would have to add log4j-core to the classpath
> > > of unit tests to get log4j-api to stop complaining".
> > >
> > > I think this should be done.  Any objections?
> > >
> > >
> > >
> >
> > --
> > -John
> > john.blum10101 (skype)
> >
>


VOTE: I need to add onto auth list for apachegeode-ci.info

2019-10-08 Thread Xiaojian Zhou
I cannot login now for some reason. I need your vote to turn on the
permissions.


Regards
Xiaojian Zhou


Re: [VOTE] Adding a lucene specific fix to release/1.10.0

2019-09-19 Thread Xiaojian Zhou
Owen:
Here are the answers:

- Is this fixing an issue of Data loss? Performance degradation?
Backward-compatibility issue? Availability impacts?  Resource exhaustion
(threads, disk, cpu, memory, sockets, etc)?

Without the fix, fields in the inherited attributes cannot be indexed, if
it's user object. For example, I have a Customer class, which contains
phoneBook. I have a subclass LocalCustomer to inherit Customer class, then
I cannot index on phoneBook.

- Did this issue exist in the previous release?
Yes.

- What is the impact of not fixing it?
Customer will see it and they have seen it.

- What are the risks of introducing this change so close to shipping?
No risk. It's standalone fix. Not to impact any where else. And it will be
backported in future if we did not do it now.

- How extensively has the fix been tested on develop?
We introduced several dunit and junit tests.

- How “sensitive” is the area of code it touches?
Not sensitive.

- What new tests have been added?
New dunit tests and junit tests.

Regards
Gester

On Thu, Sep 19, 2019 at 11:21 AM Owen Nichols  wrote:

> > On Sep 19, 2019, at 11:15 AM, Xiaojian Zhou  wrote:
> >
> > Owen:
> >
> > The reason is: it's already cherry-picked to 1.9.
>
>
> Can you kindly point me to the specific SHA where this was fixed in 1.9?
> I am not able to find it...
>
> >
> > Gester
> >
> > On Thu, Sep 19, 2019 at 11:13 AM Owen Nichols 
> wrote:
> >
> >> It looks like this has already passed the vote, but I don’t see an
> >> explanation anywhere in this thread for what makes this a "critical
> fix".
> >>
> >> As I recall release/1.10.0 was branched at the beginning of August, so
> it
> >> seems appropriate to apply a very high level of scrutiny to any
> continuing
> >> proposals to further delay the release of 1.10.0.
> >>
> >> - Is this fixing an issue of Data loss? Performance degradation?
> >> Backward-compatibility issue? Availability impacts?  Resource exhaustion
> >> (threads, disk, cpu, memory, sockets, etc)?
> >> - Did this issue exist in the previous release?
> >> - What is the impact of not fixing it?
> >> - What are the risks of introducing this change so close to shipping?
> >> - How extensively has the fix been tested on develop?
> >> - How “sensitive” is the area of code it touches?
> >> - What new tests have been added?
> >>
> >>
> >>> On Sep 19, 2019, at 11:08 AM, Anilkumar Gingade 
> >> wrote:
> >>>
> >>> +1
> >>>
> >>> On Thu, Sep 19, 2019 at 11:02 AM Eric Shu  wrote:
> >>>
> >>>> +1
> >>>>
> >>>>
> >>>> On Thu, Sep 19, 2019 at 10:59 AM Benjamin Ross 
> >> wrote:
> >>>>
> >>>>> +1
> >>>>>
> >>>>> On Thu, Sep 19, 2019 at 10:50 AM Nabarun Nag 
> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> On Thu, Sep 19, 2019 at 10:49 AM Xiaojian Zhou 
> >>>> wrote:
> >>>>>>
> >>>>>>> I want to merge GEODE-7208, which is lucene specific fix
> >>>>>>>
> >>>>>>> The fix will enable indexing on inherited attributes in user
> object.
> >>>>>>>
> >>>>>>> revision 4ec87419d456748a7d853e979c90ad4e301b2405
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Gester
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>


Re: [VOTE] Adding a lucene specific fix to release/1.10.0

2019-09-19 Thread Xiaojian Zhou
Owen:

The reason is: it's already cherry-picked to 1.9.

Gester

On Thu, Sep 19, 2019 at 11:13 AM Owen Nichols  wrote:

> It looks like this has already passed the vote, but I don’t see an
> explanation anywhere in this thread for what makes this a "critical fix".
>
> As I recall release/1.10.0 was branched at the beginning of August, so it
> seems appropriate to apply a very high level of scrutiny to any continuing
> proposals to further delay the release of 1.10.0.
>
> - Is this fixing an issue of Data loss? Performance degradation?
> Backward-compatibility issue? Availability impacts?  Resource exhaustion
> (threads, disk, cpu, memory, sockets, etc)?
> - Did this issue exist in the previous release?
> - What is the impact of not fixing it?
> - What are the risks of introducing this change so close to shipping?
> - How extensively has the fix been tested on develop?
> - How “sensitive” is the area of code it touches?
> - What new tests have been added?
>
>
> > On Sep 19, 2019, at 11:08 AM, Anilkumar Gingade 
> wrote:
> >
> > +1
> >
> > On Thu, Sep 19, 2019 at 11:02 AM Eric Shu  wrote:
> >
> >> +1
> >>
> >>
> >> On Thu, Sep 19, 2019 at 10:59 AM Benjamin Ross 
> wrote:
> >>
> >>> +1
> >>>
> >>> On Thu, Sep 19, 2019 at 10:50 AM Nabarun Nag  wrote:
> >>>
> >>>> +1
> >>>>
> >>>> On Thu, Sep 19, 2019 at 10:49 AM Xiaojian Zhou 
> >> wrote:
> >>>>
> >>>>> I want to merge GEODE-7208, which is lucene specific fix
> >>>>>
> >>>>> The fix will enable indexing on inherited attributes in user object.
> >>>>>
> >>>>> revision 4ec87419d456748a7d853e979c90ad4e301b2405
> >>>>>
> >>>>> Regards
> >>>>> Gester
> >>>>>
> >>>>
> >>>
> >>
>
>


[VOTE] Adding a lucene specific fix to release/1.10.0

2019-09-19 Thread Xiaojian Zhou
I want to merge GEODE-7208, which is lucene specific fix

The fix will enable indexing on inherited attributes in user object.

revision 4ec87419d456748a7d853e979c90ad4e301b2405

Regards
Gester


Re: [VOTE] Adding new AEQ feature to release/1.10.0

2019-09-13 Thread Xiaojian Zhou
+1

On Fri, Sep 13, 2019 at 3:23 PM Nabarun Nag  wrote:

> Hi Geode Community ,
>
> [GEODE-7121]
>
> I would like to include the new feature of creating AEQs with a paused
> event processor to the release 1.10 branch. This also includes the feature
> to resume the AEQ at a later point in time.
> This feature includes addition of new/modified APIs and gfsh commands.
>
> [All details about this feature has been discussed in a previous discuss
> thread]
>
> These are the commits that needs to be in release 1.10.0 branch.
> f6e11084daa30791f7bbf9a8187f6d1bc9c4b91a
> 615d3399d24810126a6d57b5163f7afcd06366f7
> 1440a95e266e671679a623f93865c5e7e683244f
> 42e07dc9054794657acb40c292f3af74b79a1ea6
> e1f200e2f9e77e986d250fde3848dc004b26a7c2
> 5f70160fba08a06c7e1fc48c7099e63dd1a0502b
> 0645446ec626bc351a2c881e4df6a4ae2e75fbfc
> 575c6bac115112df1e84455b052566c75764b0be
> 3d9627ff16443f4aa513a67bcc284e68953aff8a
> ea22e72916f8e34455800d347690e483727f9bf5
> 8d26d595f5fb94ff703116eb91bb747e9ba7f536
>
> Will create a PR ASAP.
>
> Regards
> Nabarun Nag
>


Re: [DISCUSS] require reviews before merging a PR

2019-05-31 Thread Xiaojian Zhou
I think my recent practice with Eric Shu's PR #3623 could be a good
example.

In this specific bug with a lot of context, it's hard for Eric Shu to find
3 persons to review. Bruce and I are the 2 right persons who know the
history and context.

Eric came to me many times and we had a lot of discussion.  I logged the
major design in GEM-2425 as comment in order not to forget (I think it's a
good idea to log the major design into either GEM or GEODE ticket). When he
finally sent me a PR, I challenged him how he implemented the design. He
walked me through the use cases and corner cases. I noticed he used some
trick logics to implement. Then I add a new comment in GEM-2425 to explain
the tricks.

As I feel it's good enough to approve, I purposely wait for Bruce to give
another review. Not surprisingly, Bruce raised the same concerns for the
tricks. Then I append the detail design and implementation into the PR to
see if it can convince Bruce. I saw Bruce asked Eric to add comments in the
code or javadoc for the tricks.

Above steps can be an alternative to review meetings. Since reviewers are
usually busy with their own assignments. If participated into 2 PR review
meetings a day, there's nothing to report for next day's standup.

If a PR requested multiple reviewers explicitly, then all of them should
approve. I think we can add this as a rule.

Regards
Gester


On Fri, May 31, 2019 at 2:10 PM Jacob Barrett  wrote:

> I’ll be posting a PR for it later next week so y’all can review.
>
> > On May 31, 2019, at 2:02 PM, Helena Bales  wrote:
> >
> > I'm happy to provide feedback on a CONTRIBUTING.md, but I don't want to
> > take the lead on this particular doc right now.
> >
> >> On Fri, May 31, 2019 at 1:48 PM Jacob Barrett 
> wrote:
> >>
> >> It is probably worthwhile to codify our “policy” so that it’s not
> confused
> >> later. Simply adding something about lazy consensus model to the
> >> CONTRIBUTING.md (which I realize we are missing, already working on
> that)
> >> might be useful.
> >>
> >> I could take a stab at the wording based on my earlier reply about this
> if
> >> no one else wants to.
> >>
> >> -jake
> >>
> >>
> >>> On May 31, 2019, at 12:44 PM, Owen Nichols 
> wrote:
> >>>
> >>> I have learned that other than the required quarterly report to the
> >> board, just about everything else about being an Apache project is just
> >> guidelines, not hard requirements.  I was confused because we do adhere
> >> rigorously to every other voting guideline on
> >> https://www.apache.org/foundation/voting.html; now I understand that is
> >> by choice and not because Apache “requires” it.
> >>>
> >>> Thank you for all the responses on this thread.  It seems like the
> >> consensus is that we’ve struck an appropriate balance already (and in
> >> particular regard to reviews, that we can trust committers to seek an
> >> appropriate amount of review based on the nature and scope of a PR).
> >>>
> >>> I will not seek a vote on enforcing a requirement of 1 (or more)
> reviews
> >> before a PR can be merged, since some valid scenarios were raised where
> 0
> >> reviews prior to merge could be appropriate.
> >>>
>  On May 31, 2019, at 9:01 AM, Jacob Barrett 
> wrote:
> 
> 
> > On May 31, 2019, at 8:52 AM, Owen Nichols 
> wrote:
> >
> > Apache requires 3 reviews for code changes. Docs and typos likely
> >> would not
> > fall under that heading.
> 
>  Where is this listed  as a requirement? The link you sent before
> >> offered guidance on common policies within the organization.
> 
> >>>
> >>
>


Re: [DISCUSS] Propose new committer and PMC member - Peter Tran

2019-05-20 Thread Xiaojian Zhou
+1

On Mon, May 20, 2019 at 11:50 AM Mike Stolz  wrote:

> This has the heading [DISCUSS] instead of [VOTE]
>
> I'm +1 anyway.
>
> On Mon, May 20, 2019 at 2:15 PM Jinmei Liao  wrote:
>
> > I'd like to discuss the proposal to add Peter Tran as a new Geode
> > committer and PMC member. Peter has been working with Manageability
> > team for a while now and has contributed to the team immensely. Before
> > that he was a key contributor to PCC team.
> >
> >
> > Please cast your vote. Voting ends one week from today (Monday, May 27th,
> > 2019).
> >
> > [   ] +1  Approve
> > [   ] +0  No opinion
> > [   ] -1   Disapprove (and reason why)
> >
> >
> > Pull requests and reviews:
> >
> > https://github.com/apache/geode/pulls?q=author%3Apetahhh
> >
> >
> >
> https://github.com/apache/geode/pulls?utf8=%E2%9C%93=commenter%3Apetahhh%20
> >
> >
> > Mailing list:
> >
> > https://geode.markmail.org/search/?q=from%3A%22ptran%22
> >
> >
> > JIRA:
> >
> >
> >
> https://issues.apache.org/jira/browse/GEODE-1910?jql=project%20%3D%20GEODE%20AND%20(reporter%20%3D%20%22peter%20tran%22%20or%20assignee%20%3D%20%22peter%20tran%22)%20ORDER%20BY%20summary%20DESC
> >
> >
> > Thanks!
> >
> > --
> > Cheers
> >
> > Jinmei
> >
>


Re: Very red CI -> Hold merges, please

2019-02-07 Thread Xiaojian Zhou
WANRollingUpgradeNewSenderProcessOldEvent is not related with GEODE-3967.
I wonder why search guided us to GEODE-3967.

Regards
Gester

On Thu, Feb 7, 2019 at 8:34 PM Owen Nichols  wrote:

> Pipeline is back to green now.  Thank you to everyone who stepped up to
> get things back on track.
>
> If you had PR checks fail this week, please re-trigger them (by making an
> empty commit).
>
> > On Feb 7, 2019, at 4:20 PM, Alexander Murmann 
> wrote:
> >
> > Bruce, would it make sense to for now revert the suspect change to the
> > test? At that point we should be back to full green and we all can
> without
> > a doubt go back to our usual flow of merging to develop.
> >
> > Thoughts?
> >
> > On Thu, Feb 7, 2019 at 2:37 PM Kirk Lund  wrote:
> >
> >> Hmm, and that was another false search hit in Jira! Searching for
> >> WANRollingUpgradeNewSenderProcessOldEvent in Jira brings up GEODE-3967
> >> which apparently does NOT involve that test. So, maybe we found another
> >> flaky test.
> >>
> >> Jira search seems to not work very well.
> >>
> >> On Thu, Feb 7, 2019 at 2:24 PM Kirk Lund  wrote:
> >>
> >>> The UpgradeTest failures on your latest commit for this PR are
> >>> WANRollingUpgradeNewSenderProcessOldEvent which seems to be a
> >> reoccurrence
> >>> of [GEODE-3967](https://issues.apache.org/jira/browse/GEODE-3967). I
> >>> recommend having Gester take a look at that these failures. He marked
> >>> [GEODE-3967](https://issues.apache.org/jira/browse/GEODE-3967) as
> >>> resolved on Jan 9th.
> >>>
> >>> On Thu, Feb 7, 2019 at 12:37 PM Jens Deppe  wrote:
> >>>
>  No worries. I think I have a better fix now. At least the builds are
>  moving
>  again.
> 
>  On Thu, Feb 7, 2019 at 12:11 PM Kirk Lund  wrote:
> 
> > Sorry, go ahead and revert the commit and reopen the PR.
> >
> > On Thu, Feb 7, 2019 at 11:36 AM Jens Deppe 
> wrote:
> >
> >> I was still working on a fix...
> >>
> >> On Thu, Feb 7, 2019 at 11:31 AM Kirk Lund  wrote:
> >>
> >>> I merged it in.
> >>>
> >>> On Thu, Feb 7, 2019 at 11:28 AM Kirk Lund 
> >> wrote:
> >>>
>  I think we should go ahead and merge in
>  https://github.com/apache/geode/pull/3172 since it resolves the
>  GfshConsoleModeUnitTest UnitTest failures.
> 
>  On Thu, Feb 7, 2019 at 9:57 AM Nabarun Nag 
>  wrote:
> 
> > FYI, I have just merged a ci timeout fix to increase the
> >> timeout
>  for
> > geode-benchmarks to 4h. This does not influence any geode
>  modules.
> >
> > Regards
> > Naba
> >
> > On Thu, Feb 7, 2019 at 9:32 AM Alexander Murmann <
> > amurm...@apache.org
> >>>
> > wrote:
> >
> >> Hi folks,
> >>
> >> Our CI is very red since ~24 hours
> >> <
> >>
> >
> >>>
> >>
> >
> 
> >>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UnitTestOpenJDK11/builds/372
> >>> .
> >> It looks like a substantial new issue was introduced.
> >>
> >> Can we hold off on merging new changes to the develop branch
>  till
> >> this
> >> issue is resolved?
> >>
> >> Thank you all!
> >>
> >
> 
> >>>
> >>
> >
> 
> >>>
> >>
>
>


Re: 2 minute gateway startup time due to GEODE-5591

2018-09-05 Thread Xiaojian Zhou
OK, after discussion with Jason and Ryan. a PR #2425 is ready. It contains
fix for 3 issues, including skipping the 2-minute-timeout.

On Wed, Sep 5, 2018 at 11:03 AM, Udo Kohlmeyer  wrote:

> +1
>
>
>
> On 9/5/18 10:35, Anthony Baker wrote:
>
>> Before this improvement is re-merged I’d like to see:
>>
>> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2
>> min when there’s a port conflict)
>> 2) A test that demonstrates how the current logic is insufficient
>>
>> Anthony
>>
>>
>> On Sep 5, 2018, at 10:20 AM, Nabarun Nag  wrote:
>>>
>>> GEODE-5591 has been reverted in develop
>>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
>>>
>>> Regards
>>> Nabarun Nag
>>>
>>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon 
>>> wrote:
>>>
>>> +1 for reverting in both places.

 I see that there is already an isGatewayReceiver flag in the
 AcceptorImpl
 constructor.  It's not ideal, but could we use this flag to prevent the
 2
 minute retry logic for happening if this flag is true?

 Ryan

 On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
 lhughesgodf...@pivotal.io> wrote:

 +1 for reverting in both places.
>
> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith  wrote:
>
> +1 for reverting in both places. The current fix is not better, that's
>>
> why
>
>> we are reverting it on the release branch!
>>
>> -Dan
>>
>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett 
>>
> wrote:
>
>> I’m not ok with reverting in develop. Revert in 1.7 and modify in
>>>
>> develop.
>>
>>> We shouldn’t go backwards in develop. The current fix is better than
>>>
>> the
>
>> bug it fixes.
>>>
>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag  wrote:

 If everyone is okay with it, I will revert that change in develop

>>> and

> then
>>>
 cherry pick it to release/1.7.0 branch.
 Please do comment.

 Regards
 Nabarun Nag


 On Wed, Sep 5, 2018 at 9:30 AM Dan Smith 
>
 wrote:

> +1 to yank it and rework the fix.
>
> Gester's change helps, but it just means that you will sometimes
>
 randomly
>>>
 have a 2 minute delay starting up a gateway receiver. I don't
>
 think

> that is
>>>
 a great user experience either.
>
> -Dan
>
> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
>
 bschucha...@pivotal.io>
>>>
 wrote:
>
> Let's yank it
>>
>>
>>
>> On 9/4/18 5:04 PM, Sean Goller wrote:
>>>
>>> If it's to get the release out, I'm fine with reverting. I don't
>>>
>> like
>>
>>> it,
>
>> but I'm not willing to die on that hill. :)
>>>
>>> -S.
>>>
>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith 
>>>
>> wrote:
>
>> Spitting this into a separate thread.
>>>
 I see the issue. The two minute timeout is the constructor for
 AcceptorImpl, where it retries to bind for 2 minutes.

 That behavior makes sense for CacheServer.start.

 But it doesn't make sense for the new logic in

>>> GatewayReceiver.start()
>>>
 from
 GEODE-5591. That code is trying to use CacheServer.start to

>>> scan

> for
>>
>>> an
>>>
 available port, trying each port in a range. That free port

>>> finding
>
>> logic
>
>> really doesn't want to have two minutes of retries for each

>>> port.

> It
>>
>>> seems
 like we need to rework the fix for GEODE-5591.

 Does it make sense to hold up the release to rework this fix,

>>> or

> should
>>>
 we
 just revert it? Have we switched concourse over to using alpine

>>> linux,
>>>
 which I think was the original motivation for this fix?

 -Dan

 On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith 

>>> wrote:
>>
>>> Why is it waiting at all in this case? Where is this 2 minute

>>> timeout
>>
>>> coming from?
>
> -Dan
>
> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
>
> sai.boorlaga...@gmail.com

 wrote:
>
>> So the issue is that it takes longer to start than previous
>>
> releases?
>>>
 Also, is this wait time only when using Gfsh to create

Re: 2 minute gateway startup time due to GEODE-5591

2018-09-05 Thread Xiaojian Zhou
The previous fix did not improve anything on 2-miniute-timeout.

On Wed, Sep 5, 2018 at 10:52 AM, Anthony Baker  wrote:

> Gester,
>
> Clearly the prior implementation had some problems, but except in
> pathological cases it provided the behavior users expected.  That’s why I
> think we need a characterization test(s) to show exactly what we want the
> behavior to be.  Merging in changes that make the user experience worse in
> the more common scenarios isn’t a good tradeoff IMO.  I see this work as
> integral to GEODE-5591 and shouldn’t be deferred to a separate ticket.
>
> Anthony
>
>
> > On Sep 5, 2018, at 10:43 AM, Xiaojian Zhou  wrote:
> >
> > The fix intend to resolve 2 issues:
> > 1) change the exception handling (for a linux version).
> > 2) prevent random picking port number to loop forever. In old code, for
> > example, if the range only contains one port, random will always pick the
> > same port and it will loop forever. The fix will stop after all available
> > ports in the range are tried. There's a test
> >
> > test_ValidateGatewayReceiverAttributes_WrongBindAddress
> >
> >
> > For 2-minute-wait, it's still possible. The fix did not resolve it
> > (when random() happened to return same port for different receiver in
> > the same member), but I did not make things worse either.
> >
> >
> > There's discussion on if we can reduce the 2-minute-timeout to a few
> > second. This is definitely another ticket.
> >
> > Regards
> >
> > Gester
> >
> >
> > On Wed, Sep 5, 2018 at 10:35 AM, Anthony Baker 
> wrote:
> >
> >> Before this improvement is re-merged I’d like to see:
> >>
> >> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2
> min
> >> when there’s a port conflict)
> >> 2) A test that demonstrates how the current logic is insufficient
> >>
> >> Anthony
> >>
> >>
> >>> On Sep 5, 2018, at 10:20 AM, Nabarun Nag  wrote:
> >>>
> >>> GEODE-5591 has been reverted in develop
> >>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
> >>>
> >>> Regards
> >>> Nabarun Nag
> >>>
> >>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon 
> >> wrote:
> >>>
> >>>> +1 for reverting in both places.
> >>>>
> >>>> I see that there is already an isGatewayReceiver flag in the
> >> AcceptorImpl
> >>>> constructor.  It's not ideal, but could we use this flag to prevent
> the
> >> 2
> >>>> minute retry logic for happening if this flag is true?
> >>>>
> >>>> Ryan
> >>>>
> >>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
> >>>> lhughesgodf...@pivotal.io> wrote:
> >>>>
> >>>>> +1 for reverting in both places.
> >>>>>
> >>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith  wrote:
> >>>>>
> >>>>>> +1 for reverting in both places. The current fix is not better,
> that's
> >>>>> why
> >>>>>> we are reverting it on the release branch!
> >>>>>>
> >>>>>> -Dan
> >>>>>>
> >>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett 
> >>>>> wrote:
> >>>>>>
> >>>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in
> >>>>>> develop.
> >>>>>>> We shouldn’t go backwards in develop. The current fix is better
> than
> >>>>> the
> >>>>>>> bug it fixes.
> >>>>>>>
> >>>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag  wrote:
> >>>>>>>>
> >>>>>>>> If everyone is okay with it, I will revert that change in develop
> >>>> and
> >>>>>>> then
> >>>>>>>> cherry pick it to release/1.7.0 branch.
> >>>>>>>> Please do comment.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Nabarun Nag
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith 
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> +1 to yank it and rework the fix.
> >>>

Re: 2 minute gateway startup time due to GEODE-5591

2018-09-05 Thread Xiaojian Zhou
The fix intend to resolve 2 issues:
1) change the exception handling (for a linux version).
2) prevent random picking port number to loop forever. In old code, for
example, if the range only contains one port, random will always pick the
same port and it will loop forever. The fix will stop after all available
ports in the range are tried. There's a test

test_ValidateGatewayReceiverAttributes_WrongBindAddress


For 2-minute-wait, it's still possible. The fix did not resolve it
(when random() happened to return same port for different receiver in
the same member), but I did not make things worse either.


There's discussion on if we can reduce the 2-minute-timeout to a few
second. This is definitely another ticket.

Regards

Gester


On Wed, Sep 5, 2018 at 10:35 AM, Anthony Baker  wrote:

> Before this improvement is re-merged I’d like to see:
>
> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2 min
> when there’s a port conflict)
> 2) A test that demonstrates how the current logic is insufficient
>
> Anthony
>
>
> > On Sep 5, 2018, at 10:20 AM, Nabarun Nag  wrote:
> >
> > GEODE-5591 has been reverted in develop
> > ref: 901da27f227a8ce2b7d6b681619782a1accd9330
> >
> > Regards
> > Nabarun Nag
> >
> > On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon 
> wrote:
> >
> >> +1 for reverting in both places.
> >>
> >> I see that there is already an isGatewayReceiver flag in the
> AcceptorImpl
> >> constructor.  It's not ideal, but could we use this flag to prevent the
> 2
> >> minute retry logic for happening if this flag is true?
> >>
> >> Ryan
> >>
> >> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
> >> lhughesgodf...@pivotal.io> wrote:
> >>
> >>> +1 for reverting in both places.
> >>>
> >>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith  wrote:
> >>>
>  +1 for reverting in both places. The current fix is not better, that's
> >>> why
>  we are reverting it on the release branch!
> 
>  -Dan
> 
>  On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett 
> >>> wrote:
> 
> > I’m not ok with reverting in develop. Revert in 1.7 and modify in
>  develop.
> > We shouldn’t go backwards in develop. The current fix is better than
> >>> the
> > bug it fixes.
> >
> >> On Sep 5, 2018, at 9:40 AM, Nabarun Nag  wrote:
> >>
> >> If everyone is okay with it, I will revert that change in develop
> >> and
> > then
> >> cherry pick it to release/1.7.0 branch.
> >> Please do comment.
> >>
> >> Regards
> >> Nabarun Nag
> >>
> >>
> >>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith 
> >> wrote:
> >>>
> >>> +1 to yank it and rework the fix.
> >>>
> >>> Gester's change helps, but it just means that you will sometimes
> > randomly
> >>> have a 2 minute delay starting up a gateway receiver. I don't
> >> think
> > that is
> >>> a great user experience either.
> >>>
> >>> -Dan
> >>>
> >>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
> > bschucha...@pivotal.io>
> >>> wrote:
> >>>
>  Let's yank it
> 
> 
> 
> > On 9/4/18 5:04 PM, Sean Goller wrote:
> >
> > If it's to get the release out, I'm fine with reverting. I don't
>  like
> >>> it,
> > but I'm not willing to die on that hill. :)
> >
> > -S.
> >
> > On Tue, Sep 4, 2018 at 4:38 PM Dan Smith 
> >>> wrote:
> >
> > Spitting this into a separate thread.
> >>
> >> I see the issue. The two minute timeout is the constructor for
> >> AcceptorImpl, where it retries to bind for 2 minutes.
> >>
> >> That behavior makes sense for CacheServer.start.
> >>
> >> But it doesn't make sense for the new logic in
> > GatewayReceiver.start()
> >> from
> >> GEODE-5591. That code is trying to use CacheServer.start to
> >> scan
>  for
> > an
> >> available port, trying each port in a range. That free port
> >>> finding
> >>> logic
> >> really doesn't want to have two minutes of retries for each
> >> port.
>  It
> >> seems
> >> like we need to rework the fix for GEODE-5591.
> >>
> >> Does it make sense to hold up the release to rework this fix,
> >> or
> > should
> >> we
> >> just revert it? Have we switched concourse over to using alpine
> > linux,
> >> which I think was the original motivation for this fix?
> >>
> >> -Dan
> >>
> >> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith 
>  wrote:
> >>
> >> Why is it waiting at all in this case? Where is this 2 minute
>  timeout
> >>> coming from?
> >>>
> >>> -Dan
> >>>
> >>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> >>>
> >> sai.boorlaga...@gmail.com
> >>
> >>> wrote:
>  So the 

Re: 2 minute gateway startup time due to GEODE-5591

2018-09-05 Thread Xiaojian Zhou
Well, I found it's already reverted.

But I think we don't have to.

After discussed with Jason, I worked out a new fix. It kept previous 5591's
intention of exception handling and improved on assigning the port.

The port is now checked if available, so it will also resolve 2 minutes
timeout issue for the retry. (Or at least will not make things worse).

On Wed, Sep 5, 2018 at 10:14 AM, Ryan McMahon  wrote:

> +1 for reverting in both places.
>
> I see that there is already an isGatewayReceiver flag in the AcceptorImpl
> constructor.  It's not ideal, but could we use this flag to prevent the 2
> minute retry logic for happening if this flag is true?
>
> Ryan
>
> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
> lhughesgodf...@pivotal.io> wrote:
>
> > +1 for reverting in both places.
> >
> > On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith  wrote:
> >
> > > +1 for reverting in both places. The current fix is not better, that's
> > why
> > > we are reverting it on the release branch!
> > >
> > > -Dan
> > >
> > > On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett 
> > wrote:
> > >
> > > > I’m not ok with reverting in develop. Revert in 1.7 and modify in
> > > develop.
> > > > We shouldn’t go backwards in develop. The current fix is better than
> > the
> > > > bug it fixes.
> > > >
> > > > > On Sep 5, 2018, at 9:40 AM, Nabarun Nag  wrote:
> > > > >
> > > > > If everyone is okay with it, I will revert that change in develop
> and
> > > > then
> > > > > cherry pick it to release/1.7.0 branch.
> > > > > Please do comment.
> > > > >
> > > > > Regards
> > > > > Nabarun Nag
> > > > >
> > > > >
> > > > >> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith 
> wrote:
> > > > >>
> > > > >> +1 to yank it and rework the fix.
> > > > >>
> > > > >> Gester's change helps, but it just means that you will sometimes
> > > > randomly
> > > > >> have a 2 minute delay starting up a gateway receiver. I don't
> think
> > > > that is
> > > > >> a great user experience either.
> > > > >>
> > > > >> -Dan
> > > > >>
> > > > >> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
> > > > bschucha...@pivotal.io>
> > > > >> wrote:
> > > > >>
> > > > >>> Let's yank it
> > > > >>>
> > > > >>>
> > > > >>>
> > > >  On 9/4/18 5:04 PM, Sean Goller wrote:
> > > > 
> > > >  If it's to get the release out, I'm fine with reverting. I don't
> > > like
> > > > >> it,
> > > >  but I'm not willing to die on that hill. :)
> > > > 
> > > >  -S.
> > > > 
> > > >  On Tue, Sep 4, 2018 at 4:38 PM Dan Smith 
> > wrote:
> > > > 
> > > >  Spitting this into a separate thread.
> > > > >
> > > > > I see the issue. The two minute timeout is the constructor for
> > > > > AcceptorImpl, where it retries to bind for 2 minutes.
> > > > >
> > > > > That behavior makes sense for CacheServer.start.
> > > > >
> > > > > But it doesn't make sense for the new logic in
> > > > GatewayReceiver.start()
> > > > > from
> > > > > GEODE-5591. That code is trying to use CacheServer.start to
> scan
> > > for
> > > > an
> > > > > available port, trying each port in a range. That free port
> > finding
> > > > >> logic
> > > > > really doesn't want to have two minutes of retries for each
> port.
> > > It
> > > > > seems
> > > > > like we need to rework the fix for GEODE-5591.
> > > > >
> > > > > Does it make sense to hold up the release to rework this fix,
> or
> > > > should
> > > > > we
> > > > > just revert it? Have we switched concourse over to using alpine
> > > > linux,
> > > > > which I think was the original motivation for this fix?
> > > > >
> > > > > -Dan
> > > > >
> > > > > On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith 
> > > wrote:
> > > > >
> > > > > Why is it waiting at all in this case? Where is this 2 minute
> > > timeout
> > > > >> coming from?
> > > > >>
> > > > >> -Dan
> > > > >>
> > > > >> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> > > > >>
> > > > > sai.boorlaga...@gmail.com
> > > > >
> > > > >> wrote:
> > > > >>> So the issue is that it takes longer to start than previous
> > > > releases?
> > > > >>> Also, is this wait time only when using Gfsh to create
> > > > >>> gateway-receiver?
> > > > >>>
> > > > >>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag 
> > > > wrote:
> > > > >>>
> > > > >>> Currently we have a minor issue in the release branch as
> > pointed
> > > > out
> > > > 
> > > > >>> by
> > > > >
> > > > >> Barry O.
> > > >  We will wait till a resolution is figured out for this
> issue.
> > > > 
> > > >  Steps:
> > > >  1. create locator
> > > >  2. start server --name=server1 --server-port=40404
> > > >  3. start server --name=server2 --server-port=40405
> > > >  4. create gateway-receiver --member=server1
> > > >  5. create gateway-receiver --member=server2 `This gets stuck
> > > for 2
> > 

Re: 2 minute gateway startup time due to GEODE-5591

2018-09-04 Thread Xiaojian Zhou
Yes. The current fix is to let each gateway receiver (in hydra tests,
there're a lot) to compete port 5500. Only one member will win, all other
members will timeout after 2 minutes. Then they keep compete for port 5501.
Again, only one member will win.

In that case, if there are 5 receivers, it will take 10 minutes to start
all the receivers.

So I enhanced the current fix (see the diff attached) to let each receiver
to pick a random port to start, if any one failed, only this guy will try
currPort++. If reached endPort, continue on startPort, until reached his
random port again.

To enhance the 2-minute-timeout is definitely another issue.

Regards
Gester

On Tue, Sep 4, 2018 at 4:38 PM, Dan Smith  wrote:

> Spitting this into a separate thread.
>
> I see the issue. The two minute timeout is the constructor for
> AcceptorImpl, where it retries to bind for 2 minutes.
>
> That behavior makes sense for CacheServer.start.
>
> But it doesn't make sense for the new logic in GatewayReceiver.start() from
> GEODE-5591. That code is trying to use CacheServer.start to scan for an
> available port, trying each port in a range. That free port finding logic
> really doesn't want to have two minutes of retries for each port. It seems
> like we need to rework the fix for GEODE-5591.
>
> Does it make sense to hold up the release to rework this fix, or should we
> just revert it? Have we switched concourse over to using alpine linux,
> which I think was the original motivation for this fix?
>
> -Dan
>
> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith  wrote:
>
> > Why is it waiting at all in this case? Where is this 2 minute timeout
> > coming from?
> >
> > -Dan
> >
> > On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> sai.boorlaga...@gmail.com
> > > wrote:
> >
> >> So the issue is that it takes longer to start than previous releases?
> >> Also, is this wait time only when using Gfsh to create gateway-receiver?
> >>
> >> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:
> >>
> >> > Currently we have a minor issue in the release branch as pointed out
> by
> >> > Barry O.
> >> > We will wait till a resolution is figured out for this issue.
> >> >
> >> > Steps:
> >> > 1. create locator
> >> > 2. start server --name=server1 --server-port=40404
> >> > 3. start server --name=server2 --server-port=40405
> >> > 4. create gateway-receiver --member=server1
> >> > 5. create gateway-receiver --member=server2 `This gets stuck for 2
> >> minutes`
> >> >
> >> > Is the 2 minute wait time acceptable? Should we document it? When we
> >> revert
> >> > GEODE-5591, this issue does not happen.
> >> >
> >> > Regards
> >> > Nabarun Nag
> >> >
> >>
> >
>
diff --git 
a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/WANTestBase.java
 
b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/WANTestBase.java
index a09194209..e13e7ec78 100644
--- 
a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/WANTestBase.java
+++ 
b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/WANTestBase.java
@@ -2020,7 +2020,7 @@ public class WANTestBase extends DistributedTestCase {
 GatewayReceiver receiver = fact.create();
 assertThatThrownBy(receiver::start)
 .isInstanceOf(GatewayReceiverException.class)
-.hasMessageContaining("No available free port found in the given 
range");
+.hasMessageContaining("Failed to create server socket on");
   }
 
   public static int createReceiverWithSSL(int locPort) {
diff --git 
a/geode-wan/src/integrationTest/java/org/apache/geode/internal/cache/wan/misc/WANConfigurationJUnitTest.java
 
b/geode-wan/src/integrationTest/java/org/apache/geode/internal/cache/wan/misc/WANConfigurationJUnitTest.java
index 038b759ae..ccd9503e6 100644
--- 
a/geode-wan/src/integrationTest/java/org/apache/geode/internal/cache/wan/misc/WANConfigurationJUnitTest.java
+++ 
b/geode-wan/src/integrationTest/java/org/apache/geode/internal/cache/wan/misc/WANConfigurationJUnitTest.java
@@ -448,7 +448,8 @@ public class WANConfigurationJUnitTest {
 
 
 GatewayReceiver receiver = fact.create();
-assertThatThrownBy(() -> 
receiver.start()).isInstanceOf(GatewayReceiverException.class);
+assertThatThrownBy(() -> 
receiver.start()).isInstanceOf(GatewayReceiverException.class)
+.hasMessageContaining("Failed to create server socket on");
   }
 
   @Test
diff --git 
a/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java
 
b/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java
index cd2702991..786b354a4 100644
--- 
a/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java
+++ 
b/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java
@@ -26,6 +26,7 @@ import org.apache.geode.cache.wan.GatewayReceiver;
 import org.apache.geode.cache.wan.GatewayTransportFilter;
 import 

Re: 1.6.0 release

2018-04-19 Thread Xiaojian Zhou
I have cherry-picked GEODE-5056 into 9.5 and 1.6.0

On Thu, Apr 19, 2018 at 9:09 AM, Bruce Schuchardt 
wrote:

> Thanks Mike - I've cherry-picked the fix onto the release/1.6.0 branch
>
>
>
> On 4/18/18 5:11 PM, Michael Stolz wrote:
>
>> Yes please. I'm holding the build til this gets in. Please notify me here
>> when it's ready
>>
>> --
>> Mike Stolz
>> Principal Engineer - Gemfire Product Manager
>> Mobile: 631-835-4771
>>
>> On Apr 18, 2018 8:05 PM, "Bruce Schuchardt" 
>> wrote:
>>
>> A couple of people have reported running into GEODE-5085 <
>>> https://issues.apache.org/jira/browse/GEODE-5085>, which prevents a
>>> server from rejoining the cluster if it gets kicked out and a
>>> SecurityManager has been configured.
>>>
>>> Is this something we could get into the 1.6 release?  The fix is a single
>>> commit and it's been through precheckin testing a few times now.
>>>
>>>
>>>
>>>
>


Re: [RESULT][VOTE] Apache Geode release - 1.4.0 RC2

2018-02-01 Thread Xiaojian Zhou
+1

On Thu, Feb 1, 2018 at 2:19 PM, Swapnil Bawaskar 
wrote:

> This vote passes with five +1 votes, no 0 or -1 votes.
>
> Summary:
> Dan Smith   +1
> Anthony Baker+1
> Sai Boorlagadda +1
> Jinmei Liao +1
> Dick Cavendar+1
>
> vote thread: http://markmail.org/thread/vomlng4jqcy5hfm3
>
>
> On Wed, Jan 31, 2018 at 8:49 AM Dick Cavender 
> wrote:
>
> > +1
> >
> > - Download src and binary dists
> > - Confirmed all dists extract
> > - Confirmed src builds
> > - Confirmed version and gfsh commands
> > - Reviewed LICENSE and NOTICE contents
> > - Reviewed maven distribution but didn't consume
> >
> > -Dick
> >
> >
> > On Mon, Jan 29, 2018 at 4:08 PM, Swapnil Bawaskar 
> > wrote:
> >
> > > After fixing the security concerns in the first release candidate, this
> > is
> > > the second release candidate for Apache Geode, version 1.4.0.
> > > Thanks to all the community members for their contributions to this
> > > release!
> > >
> > > *** Please download, test and vote by Thursday, Feb 1, 1400 hrs
> > > US Pacific. ***
> > >
> > > It fixes 277 issues. release notes can be found at:
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > projectId=12318420=12341842
> > >
> > > Note that we are voting upon the source tags: rel/v1.4.0.RC2
> > > https://github.com/apache/geode/tree/rel/v1.4.0.RC2
> > > https://github.com/apache/geode-examples/tree/rel/v1.4.0.RC2
> > >
> > > Commit ID:
> > > 2a70679608120042fa7cbee67f4dd21a085d9588 (geode)
> > > ced35f88173b04ac8e104b9cae10cc38289675fa (geode-examples)
> > >
> > > Source and binary files:
> > > https://dist.apache.org/repos/dist/dev/geode/1.4.0.RC2
> > >
> > > Maven staging repo:
> > > https://repository.apache.org/content/repositories/orgapachegeode-1037
> > >
> > >
> > > Geode's KEYS file containing PGP keys we use to sign the release:
> > > https://github.com/apache/geode/blob/develop/KEYS
> > >
> > > Release Signed with Key: pub 4096R/18F902DB 2016-04-07
> > > Fingerprint: E1B1 ABE3 4753 E7BA 8097 4285 8F8F 2BCC 18F9 02DB
> > >
> >
>


Re: [DISCUSS] Benchmarks module package structure

2018-01-07 Thread Xiaojian Zhou
The package might be always a problem. Even you put the cq benchmark code
under geode-cq to near its source code, it might still have to access code
under other package, such as geode-core.

So I think put benchmark test code under benchmark package is ok. Your
option 2) is good.

Regards
Gester

On Fri, Jan 5, 2018 at 11:57 AM, Nick Reich  wrote:

> Team,
>
> I am in the progress of adding new benchmarks to the (currently sparse)
> geode-benchmarks module. The lack of current examples in the module leads
> me to wonder what the correct organization of benchmarks in the module is
> (and if this is the right location).
>
> The existing benchmarks are all in org.apache.geode.cache.benchmark.
> Following this pattern would (presumably) result in benchmark subpackages
> in each package that has benchmarks. Making the root package
> org.apache.geode.benchmark would remove this proliferation of sub packages.
> However, both these approaches have the issue that package level
> methods/classes cannot be accessed from benchmarks as they will never share
> a package with the production code.
>
> 1) Should benchmarks then not be put in special benchmark packages?
>
> 2) Should our benchmarks not invoke package level methods/classes in the
> case that we should use benchmark packages? Or should such benchmarks not
> reside in the benchmarks module?
>
> 3) Is geode-benchmarks where we intend all benchmarks, only certain classes
> of benchmarks (all using jmh for example), or would we prefer embedding
> them in the modules where the code being benchmarked resides?
>
> Thanks for your input.
>


Re: PRs should always include tests

2017-12-29 Thread Xiaojian Zhou
How about the code change is already covered by existing tests?

Not to reduce test coverage seems a more reasonable standard.

On Fri, Dec 29, 2017 at 2:07 PM, Udo Kohlmeyer  wrote:

> +1
>
>
>
> On 12/29/17 12:05, Kirk Lund wrote:
>
>> I think we all need to be very consistent in requiring tests with all PRs.
>> This goes for committer as well as non-committer contributions.
>>
>> A test would both confirm the existence of the bug in the first place and
>> then confirm the fix. Without such a test, any developer could come along
>> later, modify the code in question and break it without ever realizing it.
>> A test would protect the behavior that was fixed or introduced.
>>
>> Also if we are not consistent in requiring tests for all contributions,
>> then contributors will learn to pick and choose which reviewers to listen
>> to and which ones to ignore.
>>
>> I for one do not want to waste my time reviewing and requesting changes
>> only to be ignored and have said PR be committed without the (justified)
>> changes I've requested.
>>
>>
>


Re: [DISCUSS] Addition of isValid API to Index interface

2017-09-11 Thread Xiaojian Zhou
There's no way to rollback an put/putAll, unless in TX.

On Sun, Sep 10, 2017 at 4:21 PM, Jason Huynh  wrote:

> 1.)  Does anyone know of a way to do a rollback where the put is already
> reflected in the region?  If that is the desired behavior, then perhaps we
> will have to live with the current (leaving the region and indexes in a bad
> state, wan and other callbacks that occur after index maintenance will not
> occur for the one operation but the put has made it into the region) until
> someone can figure out how to roll a put back and revert the update to all
> the indexes.
>
> How should this affect putAll, if at all?
>
> Any callbacks that occur before index update have already been called
> (cache writers?). I am not sure how those should be affected by a
> rollback...
>
> 2.)  So the index behavior changes if they are marked for sync/async.  In
> sync the index would reject the put, but in async they would just be marked
> as invalid.
>
>
>
>
> On Sat, Sep 9, 2017 at 6:48 AM John Blum  wrote:
>
> > +1 to both of Anil's points.
> >
> > On Fri, Sep 8, 2017 at 3:04 PM, Anilkumar Gingade 
> > wrote:
> >
> > > Indexes are critical for querying; most of the databases doesn't allow
> > > insert/update if there is any failure with index maintenance...
> > >
> > > As Geode OQL supports two ways (sync and async) to maintain the
> indexes,
> > we
> > > need be careful about the error handling in both cases...
> > >
> > > My take is:
> > > 1. For synchronous index maintenance:
> > > If there is any failure in updating any index (security/auth or logical
> > > error) on the region; throw an exception and rollback the cache
> update/op
> > > (index management id done under region.entry lock - we should be able
> to
> > > revert the op). If index or cache is left in bad state, then its a bug
> > that
> > > needs to be addressed.
> > >
> > > Most of the time, If there is any logical error in index, it will be
> > > detected as soon as index is created (on existing data) or when first
> > > update is done to the cache.
> > >
> > > 2. For Asynchronous index maintenance:
> > > As this is async (assuming) user has good understanding of the risk
> > > involved with async, any error with index maintenance, the index should
> > be
> > > invalidated...
> > >
> > >  About the security/auth, the user permission with region read/write
> > needs
> > > to be applied for index updates, there should not be different
> permission
> > > on index.
> > >
> > > -Anil.
> > >
> > >
> > >
> > > On Fri, Sep 8, 2017 at 2:01 PM, Nabarun Nag  wrote:
> > >
> > > > Hi Mike,
> > > >
> > > > Please do find our answers below:
> > > > *Question:* What if there were multiple indices that were in flight
> and
> > > > only the third
> > > > one errors out, will they all be marked invalid?
> > > >
> > > > *Answer:* Only the third will be marked invalid and only the third
> one
> > > will
> > > > not be used for query execution.
> > > >
> > > > *Question/Statement:* If anything goes wrong with the put it should
> > > > probably still throw back to
> > > > the caller. Silent invalidation of the index is probably not
> desirable.
> > > >
> > > > *Answer: *
> > > > In our current design this the flow of execution of a put operation:
> > > > entry put into region -> update index -> other wan related
> executions /
> > > > callbacks etc.
> > > >
> > > > If an exception happens while updating the index, the cache gets
> into a
> > > bad
> > > > state, and we may end up getting different results depending on the
> > index
> > > > we are using. As the failure happens half way in a put operation, the
> > > > regions / cache are now in a bad state.
> > > > --
> > > > We are thinking that if index is created  over a method invocation in
> > an
> > > > empty region and then we do puts, but method invocation is not
> allowed
> > as
> > > > per security policies. The puts will now be successful but the index
> > will
> > > > be rendered invalid. Previously the puts will fail with exception and
> > put
> > > > the entire cache in a bad state.
> > > >
> > > >
> > > >
> > > > Regards
> > > > Nabarun
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Sep 8, 2017 at 10:43 AM Michael Stolz 
> > wrote:
> > > >
> > > > > Just to help me understand, the index is corrupted in a way beyond
> > just
> > > > the
> > > > > field that errors out?
> > > > > What if there were multiple indices that were in flight and only
> the
> > > > third
> > > > > one errors out, will they all be marked invalid?
> > > > > If anything goes wrong with the put it should probably still throw
> > back
> > > > to
> > > > > the caller. Silent invalidation of the index is probably not
> > desirable.
> > > > >
> > > > > --
> > > > > Mike Stolz
> > > > > Principal Engineer, GemFire Product Manager
> > > > > Mobile: +1-631-835-4771 <(631)%20835-4771> <(631)%20835-4771>
> > > > >
> 

Review Request 62180: refactor away GemfireCacheImpl.getInstance from lucene function

2017-09-07 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62180/
---

Review request for geode, Barry Oglesby and Dan Smith.


Bugs: GEODE-3557
https://issues.apache.org/jira/browse/GEODE-3557


Repository: geode


Description
---

use dm.getCache() instead


Diffs
-

  
geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/DestroyLuceneIndexMessage.java
 9eada5b20 


Diff: https://reviews.apache.org/r/62180/diff/1/


Testing
---


Thanks,

xiaojian zhou



Re: Missing Gitbox activation email

2017-09-06 Thread Xiaojian Zhou
I did not get any email. But it seems all set for me.

​

On Wed, Sep 6, 2017 at 4:33 PM, Nabarun Nag  wrote:

> *I think the email takes some time to arrive."An organisational invite will
> be sent to you via email shortly thereafter (within 30 minutes)."*
>
> On Wed, Sep 6, 2017 at 4:21 PM Dan Smith  wrote:
>
> > If you are stuck on 3rd step (MFA Status) make you have added your github
> > username on id.apache.org *and* that you have accepted the invitation to
> > join the apache group on github.
> >
> > You should see an apache feather listed underneath your organizations on
> > github.
> >
> > -Dan
> >
> > On Wed, Sep 6, 2017 at 4:08 PM, Jacob Barrett 
> wrote:
> >
> > > I’d hit up Infra tomorrow if it isn’t working by then.
> > >
> > > > On Sep 6, 2017, at 4:06 PM, Jared Stewart 
> wrote:
> > > >
> > > > I’m stuck on the same step.  I tried clearing out my GitHub username
> at
> > > id.apache.org  and then re-adding it in the
> hopes
> > > of re-triggering the email, but it still hasn’t arrived.
> > > >
> > > > - Jared
> > > >> On Sep 6, 2017, at 4:04 PM, Udo Kohlmeyer  wrote:
> > > >>
> > > >> Hey there,
> > > >>
> > > >> I've gone through all the steps to activate my github user with
> > gitbox.
> > > >>
> > > >> Looking at gitbox.apache.org, I was completed step 2 of 3. I'm
> > > currently waiting for the email that I'm supposed to receive, BUT it
> has
> > > never arrived. I have checked my Spam folder and it was not in there
> > either.
> > > >>
> > > >> Does anyone know how to have to email sent *again*?
> > > >>
> > > >> --Udo
> > > >
> > >
> >
>


Re: Geode PR pile up

2017-07-21 Thread Xiaojian Zhou
I'm merging PR 648 now

On Fri, Jul 21, 2017 at 3:32 PM, Jacob Barrett  wrote:

> All,
>
> I followed up on the issue regarding abandoned PRs. The only way to close
> them if the user has walked away is to do an empty commit. Thanks to Mark
> for finding this ticket with explanation
> https://issues.apache.org/jira/browse/INFRA-13690 and instructions on how
> to do it.
>
> I would say any PR that goes unanswered for a reasonable period should get
> closed out. Please make attempts to get the user to close them if we can
> though.
>
> -Jake
>
> On Thu, Jul 20, 2017 at 10:01 AM Udo Kohlmeyer 
> wrote:
>
> > Hi there fellow Geode devs.
> >
> > I've just been cleaning up some PR's, and I'm seeing that we 33 odd PR's
> > open, 20 of them older than 30 days.
> >
> > Not beat on the pedantic drum, but it would be really nice to have the
> > PR's either applied, addressed or closed(rejected).
> >
> > Does anyone have any ideas how we can keep our PR backlog small and
> > manageable vs a backlog that keeps on growing.
> >
> > --Udo
> >
> >
>


Review Request 59926: waitUntilFlush should check if its brq's tempQueue is not empty

2017-06-08 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59926/
---

Review request for geode, Barry Oglesby and Dan Smith.


Bugs: GEODE-3055
https://issues.apache.org/jira/browse/GEODE-3055


Repository: geode


Description
---

There's time window that data region bucket is ready, but shadow key's bucket 
is not. So the event will be added into tempQueue at that windows.

If we run waitUntilFlush during that window, it did not check the tempQueue 
since its brq is not exist yet. It will cause data mismatch (i.e. we found the 
key in data region, but not in index)

We should pass in the data region's bucket list and let it wait until these 
tempQueue are empty.


Diffs
-

  
geode-core/src/main/java/org/apache/geode/cache/asyncqueue/internal/AsyncEventQueueImpl.java
 bf7e87445 
  
geode-core/src/main/java/org/apache/geode/internal/cache/wan/AbstractGatewaySender.java
 c38d5475a 
  
geode-core/src/main/java/org/apache/geode/internal/cache/wan/parallel/WaitUntilParallelGatewaySenderFlushedCoordinator.java
 42ce68cab 
  
geode-core/src/test/java/org/apache/geode/internal/cache/wan/parallel/WaitUntilParallelGatewaySenderFlushedCoordinatorJUnitTest.java
 5e12ed5ab 
  
geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/distributed/WaitUntilFlushedFunction.java
 e11384c59 
  
geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/distributed/WaitUntilFlushedFunctionJUnitTest.java
 f92a296f7 


Diff: https://reviews.apache.org/r/59926/diff/1/


Testing
---


Thanks,

xiaojian zhou



[jira] [Resolved] (GEODE-1775) CI failure: ParallelWANPropagationClientServerDUnitTest.testParallelPropagationWithClientServer

2017-05-11 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-1775.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> CI failure: 
> ParallelWANPropagationClientServerDUnitTest.testParallelPropagationWithClientServer
> ---
>
> Key: GEODE-1775
> URL: https://issues.apache.org/jira/browse/GEODE-1775
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Grace Meilen
>Assignee: Dan Smith
>  Labels: ci, flaky
> Fix For: 1.2.0
>
>
> {no format}
> :geode-wan:distributedTest
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest
>  > testParallelPropagationWithClientServer FAILED
> com.gemstone.gemfire.test.dunit.RMIException: While invoking 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest$$Lambda$19/1746236140.run
>  in VM 4 running on Host 9ff79c8190b7 with 8 VMs
> at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389)
> at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355)
> at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:293)
> at 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest.testParallelPropagationWithClientServer(ParallelWANPropagationClientServerDUnitTest.java:59)
> Caused by:
> com.gemstone.gemfire.cache.NoSubscriptionServersAvailableException: 
> com.gemstone.gemfire.cache.NoSubscriptionServersAvailableException: Primary 
> discovery failed.
> at 
> com.gemstone.gemfire.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:198)
> at 
> com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:550)
> at 
> com.gemstone.gemfire.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:763)
> at 
> com.gemstone.gemfire.cache.client.internal.RegisterInterestOp.execute(RegisterInterestOp.java:63)
> at 
> com.gemstone.gemfire.cache.client.internal.ServerRegionProxy.registerInterest(ServerRegionProxy.java:376)
> at 
> com.gemstone.gemfire.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3968)
> at 
> com.gemstone.gemfire.internal.cache.LocalRegion.registerInterest(LocalRegion.java:4058)
> at 
> com.gemstone.gemfire.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3873)
> at 
> com.gemstone.gemfire.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3867)
> at 
> com.gemstone.gemfire.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3863)
> at 
> com.gemstone.gemfire.internal.cache.wan.WANTestBase.createClientWithLocator(WANTestBase.java:2154)
> at 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest.lambda$testParallelPropagationWithClientServer$cb73cba9$3(ParallelWANPropagationClientServerDUnitTest.java:59)
> Caused by:
> 
> com.gemstone.gemfire.cache.NoSubscriptionServersAvailableException: Primary 
> discovery failed.
> {no format}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 59040: when advisor cannot found target nodes for bucket id, should double check if the member is offline

2017-05-08 Thread Xiaojian Zhou
Yes, PartitionOfflineException is the expected behavior.

On Mon, May 8, 2017 at 11:32 AM, Barry Oglesby <bogle...@pivotal.io> wrote:

>
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59040/#review174209
> ---
>
>
>
>
> geode-core/src/main/java/org/apache/geode/internal/cache/execute/
> FunctionExecutionNodePruner.java
> Lines 63 (patched)
> <https://reviews.apache.org/r/59040/#comment247328>
>
> What happens in the persistent case? Does it throw a
> PartitionOfflineException?
>
>
> - Barry Oglesby
>
>
> On May 7, 2017, 5:47 p.m., xiaojian zhou wrote:
> >
> > ---
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/59040/
> > ---
> >
> > (Updated May 7, 2017, 5:47 p.m.)
> >
> >
> > Review request for geode, Barry Oglesby and Dan Smith.
> >
> >
> > Bugs: GEODE-2824
> > https://issues.apache.org/jira/browse/GEODE-2824
> >
> >
> > Repository: geode
> >
> >
> > Description
> > ---
> >
> > This is a race condition. When a member is offline (in redundentcopy=0
> case), an earlier check will found that. But if it passed the check, the
> code will enter a retry loop to ask advisor to give the target node.
> Finally the advisor will return an empty list of member. Then the code will
> screw up and throw the "No target node found" exception.
> >
> > The fix is: when the empty list is return, double check if target node
> is offline.
> >
> >
> > Diffs
> > -
> >
> >   geode-core/src/main/java/org/apache/geode/internal/cache/execute/
> FunctionExecutionNodePruner.java 18700a75d
> >
> >
> > Diff: https://reviews.apache.org/r/59040/diff/1/
> >
> >
> > Testing
> > ---
> >
> >
> > Thanks,
> >
> > xiaojian zhou
> >
> >
>
>


[jira] [Updated] (GEODE-2824) FunctionException: No target node found when executing hasNext on Lucene result

2017-05-08 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-2824:
-
Fix Version/s: 1.2.0

> FunctionException: No target node found when executing hasNext on Lucene 
> result
> ---
>
> Key: GEODE-2824
> URL: https://issues.apache.org/jira/browse/GEODE-2824
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Jason Huynh
>    Assignee: xiaojian zhou
> Fix For: 1.2.0
>
>
> The stack trace below is thrown during a race condition when a node is 
> closing and calling hasNext on a Lucene result.
> It looks there was a CacheClosedException, but this execution was unable to 
> find a target node to retry on.  This execution then threw a 
> FunctionException.
> We have code to unwrap CacheClosedExceptions from function exceptions, 
> however this was just an ordinary function exception.  The underlying cause 
> is that the cache is closing at this time.
> We should probably wrap all function exceptions with either a 
> LuceneQueryException or equivalent as a user would probably not expect a 
> FunctionException when calling Lucene methods.
> The stack trace:
> {noformat}
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeOnMultipleNodes(PartitionedRegion.java:3459)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeFunction(PartitionedRegion.java:3367)
> at 
> org.apache.geode.internal.cache.execute.PartitionedRegionFunctionExecutor.executeFunction(PartitionedRegionFunctionExecutor.java:228)
> at 
> org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:376)
> at 
> org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:178)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getValues(PageableLuceneQueryResultsImpl.java:112)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getHitEntries(PageableLuceneQueryResultsImpl.java:91)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.advancePage(PageableLuceneQueryResultsImpl.java:139)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.hasNext(PageableLuceneQueryResultsImpl.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (GEODE-2824) FunctionException: No target node found when executing hasNext on Lucene result

2017-05-08 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-2824.
--
Resolution: Fixed

> FunctionException: No target node found when executing hasNext on Lucene 
> result
> ---
>
> Key: GEODE-2824
> URL: https://issues.apache.org/jira/browse/GEODE-2824
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Jason Huynh
>    Assignee: xiaojian zhou
>
> The stack trace below is thrown during a race condition when a node is 
> closing and calling hasNext on a Lucene result.
> It looks there was a CacheClosedException, but this execution was unable to 
> find a target node to retry on.  This execution then threw a 
> FunctionException.
> We have code to unwrap CacheClosedExceptions from function exceptions, 
> however this was just an ordinary function exception.  The underlying cause 
> is that the cache is closing at this time.
> We should probably wrap all function exceptions with either a 
> LuceneQueryException or equivalent as a user would probably not expect a 
> FunctionException when calling Lucene methods.
> The stack trace:
> {noformat}
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeOnMultipleNodes(PartitionedRegion.java:3459)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeFunction(PartitionedRegion.java:3367)
> at 
> org.apache.geode.internal.cache.execute.PartitionedRegionFunctionExecutor.executeFunction(PartitionedRegionFunctionExecutor.java:228)
> at 
> org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:376)
> at 
> org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:178)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getValues(PageableLuceneQueryResultsImpl.java:112)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getHitEntries(PageableLuceneQueryResultsImpl.java:91)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.advancePage(PageableLuceneQueryResultsImpl.java:139)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.hasNext(PageableLuceneQueryResultsImpl.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Review Request 59040: when advisor cannot found target nodes for bucket id, should double check if the member is offline

2017-05-07 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59040/
---

Review request for geode, Barry Oglesby and Dan Smith.


Bugs: GEODE-2824
https://issues.apache.org/jira/browse/GEODE-2824


Repository: geode


Description
---

This is a race condition. When a member is offline (in redundentcopy=0 case), 
an earlier check will found that. But if it passed the check, the code will 
enter a retry loop to ask advisor to give the target node. Finally the advisor 
will return an empty list of member. Then the code will screw up and throw the 
"No target node found" exception. 

The fix is: when the empty list is return, double check if target node is 
offline.


Diffs
-

  
geode-core/src/main/java/org/apache/geode/internal/cache/execute/FunctionExecutionNodePruner.java
 18700a75d 


Diff: https://reviews.apache.org/r/59040/diff/1/


Testing
---


Thanks,

xiaojian zhou



[jira] [Resolved] (GEODE-1734) Lucene search for a single entry is returning multiple results

2017-05-04 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-1734.
--
Resolution: Fixed

Fixed in GEODE-2241 revision 0182a1bb744d25fe490d142dfed7d9a6f20b2713

> Lucene search for a single entry is returning multiple results
> --
>
> Key: GEODE-1734
> URL: https://issues.apache.org/jira/browse/GEODE-1734
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: William Markito Oliveira
>    Assignee: xiaojian zhou
>
> Searching for a unique entry is returning multiple results, although the key 
> is the same.  It should return a single result.
> {code}
> gfsh>lucene search --name=customerRegionAll 
> --queryStrings="firstName:Jdfmlevjenzwgd" --region=/customer 
> --defaultField=displayName
> key  |
>value   | score
>  | 
> -
>  | -
> 70dbdb7f-648e-415e-880d-15631f87a523 | 
> PDX[16777220,org.example.domain.model.CustomerEntity]{active=false, 
> addresses=.. | 12.798602
> 70dbdb7f-648e-415e-880d-15631f87a523 | 
> PDX[16777220,org.example.domain.model.CustomerEntity]{active=false, 
> addresses=.. | 12.798602
> 70dbdb7f-648e-415e-880d-15631f87a523 | 
> PDX[16777220,org.example.domain.model.CustomerEntity]{active=false, 
> addresses=.. | 12.798602
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (GEODE-2848) While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in remote members before stopping the AsyncEventQueue

2017-05-03 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-2848:
-
Fix Version/s: 1.2.0

> While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in 
> remote members before stopping the AsyncEventQueue
> -
>
> Key: GEODE-2848
> URL: https://issues.apache.org/jira/browse/GEODE-2848
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Barry Oglesby
> Fix For: 1.2.0
>
>
> This causes a NullPointerException in BatchRemovalThread getAllRecipients 
> like:
> {noformat}
> [fine 2017/04/24 14:27:29.163 PDT gemfire4_r02-s28_3222  
> tid=0x6b] BatchRemovalThread: ignoring exception
> java.lang.NullPointerException
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.getAllRecipients(ParallelGatewaySenderQueue.java:1776)
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1722)
> {noformat}
> This message is currently only logged at fine level and doesn't cause any 
> real issues.
> The simple fix is to check for null in getAllRecipients like:
> {noformat}
> PartitionedRegion pReg = ((PartitionedRegion) (cache.getRegion((String) pr)));
> if (pReg != null) {
>   recipients.addAll(pReg.getRegionAdvisor().adviseDataStore());
> }
> {noformat}
> Another more complex fix is to change the destroyIndex sequence.
> The current destroyIndex sequence is:
> # stops and destroys the AEQ in the initiator (including the underlying PR)
> # closes the repository manager in the initiator
> # stops and destroys the AEQ in remote members (not including the underlying 
> PR)
> # closes the repository manager in the remote members
> # destroys the fileAndChunk region in the initiator
> Between steps 1 and 3, the region will be null in the remote members, so the 
> NPE can occur.
> A better sequence would be:
> # stops the AEQ in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the initiator
> # closes the repository manager in the remote members
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the fileAndChunk region in the initiator
> That would be 3 messages between the members.
> I think that can be combined into one remote message like:
> # stops the AEQ in the initiator
> # closes the repository manager in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the remote members
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the fileAndChunk region in the initiator



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (GEODE-2848) While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in remote members before stopping the AsyncEventQueue

2017-05-03 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-2848.
--
Resolution: Fixed

fix in revision d4ece31fa23bbe74c8be0a82ff4b9d143bad79b3

> While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in 
> remote members before stopping the AsyncEventQueue
> -
>
> Key: GEODE-2848
> URL: https://issues.apache.org/jira/browse/GEODE-2848
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Barry Oglesby
>
> This causes a NullPointerException in BatchRemovalThread getAllRecipients 
> like:
> {noformat}
> [fine 2017/04/24 14:27:29.163 PDT gemfire4_r02-s28_3222  
> tid=0x6b] BatchRemovalThread: ignoring exception
> java.lang.NullPointerException
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.getAllRecipients(ParallelGatewaySenderQueue.java:1776)
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1722)
> {noformat}
> This message is currently only logged at fine level and doesn't cause any 
> real issues.
> The simple fix is to check for null in getAllRecipients like:
> {noformat}
> PartitionedRegion pReg = ((PartitionedRegion) (cache.getRegion((String) pr)));
> if (pReg != null) {
>   recipients.addAll(pReg.getRegionAdvisor().adviseDataStore());
> }
> {noformat}
> Another more complex fix is to change the destroyIndex sequence.
> The current destroyIndex sequence is:
> # stops and destroys the AEQ in the initiator (including the underlying PR)
> # closes the repository manager in the initiator
> # stops and destroys the AEQ in remote members (not including the underlying 
> PR)
> # closes the repository manager in the remote members
> # destroys the fileAndChunk region in the initiator
> Between steps 1 and 3, the region will be null in the remote members, so the 
> NPE can occur.
> A better sequence would be:
> # stops the AEQ in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the initiator
> # closes the repository manager in the remote members
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the fileAndChunk region in the initiator
> That would be 3 messages between the members.
> I think that can be combined into one remote message like:
> # stops the AEQ in the initiator
> # closes the repository manager in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the remote members
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the fileAndChunk region in the initiator



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (GEODE-2824) FunctionException: No target node found when executing hasNext on Lucene result

2017-05-02 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou reassigned GEODE-2824:


Assignee: xiaojian zhou

> FunctionException: No target node found when executing hasNext on Lucene 
> result
> ---
>
> Key: GEODE-2824
> URL: https://issues.apache.org/jira/browse/GEODE-2824
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Jason Huynh
>    Assignee: xiaojian zhou
>
> The stack trace below is thrown during a race condition when a node is 
> closing and calling hasNext on a Lucene result.
> It looks there was a CacheClosedException, but this execution was unable to 
> find a target node to retry on.  This execution then threw a 
> FunctionException.
> We have code to unwrap CacheClosedExceptions from function exceptions, 
> however this was just an ordinary function exception.  The underlying cause 
> is that the cache is closing at this time.
> We should probably wrap all function exceptions with either a 
> LuceneQueryException or equivalent as a user would probably not expect a 
> FunctionException when calling Lucene methods.
> The stack trace:
> {noformat}
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeOnMultipleNodes(PartitionedRegion.java:3459)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.executeFunction(PartitionedRegion.java:3367)
> at 
> org.apache.geode.internal.cache.execute.PartitionedRegionFunctionExecutor.executeFunction(PartitionedRegionFunctionExecutor.java:228)
> at 
> org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:376)
> at 
> org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:178)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getValues(PageableLuceneQueryResultsImpl.java:112)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.getHitEntries(PageableLuceneQueryResultsImpl.java:91)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.advancePage(PageableLuceneQueryResultsImpl.java:139)
> at 
> org.apache.geode.cache.lucene.internal.PageableLuceneQueryResultsImpl.hasNext(PageableLuceneQueryResultsImpl.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 58853: GEODE-2847: Get correct version tags for retried bulk operations

2017-05-01 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58853/#review173555
---


Fix it, then Ship it!




fix and ship it.


geode-core/src/main/java/org/apache/geode/internal/cache/EventTracker.java
Line 545 (original), 546 (patched)
<https://reviews.apache.org/r/58853/#comment246542>

It's better to initialize it to null. 

Fix and commit.



geode-core/src/main/java/org/apache/geode/internal/cache/EventTracker.java
Line 561 (original), 560 (patched)
<https://reviews.apache.org/r/58853/#comment246545>

you don't need to remove here since you have "finally"


- xiaojian zhou


On May 2, 2017, 1 a.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58853/
> ---
> 
> (Updated May 2, 2017, 1 a.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, and Lynn 
> Gallinat.
> 
> 
> Bugs: GEODE-2847
> https://issues.apache.org/jira/browse/GEODE-2847
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Get correct version tags from recordedBulkOpVersionTags in eventTracker.
> Do not remove the recordedBulkOpVersionTags prematurely.
> Add the unit test which would fail without the fixes.
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/EventTracker.java 
> 2ddfdc4 
>   geode-core/src/main/java/org/apache/geode/internal/cache/LocalRegion.java 
> 8c061b0 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/partitioned/PutAllPRMessage.java
>  27f5aa0 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/partitioned/RemoveAllPRMessage.java
>  f4f6299 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/ClientProxyMembershipID.java
>  2cbf63b 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/AbstractDistributedRegionJUnitTest.java
>  ba2f794 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/DistributedRegionJUnitTest.java
>  7525f35 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/EventTrackerTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58853/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>



[jira] [Commented] (GEODE-2848) While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in remote members before stopping the AsyncEventQueue

2017-05-01 Thread xiaojian zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991583#comment-15991583
 ] 

xiaojian zhou commented on GEODE-2848:
--

I think it does not worth to introduce complexity of new message or re-arrange 
the message processing sequence. 

But the regionToDispatchedKeysMap will be cleared and temp will be lost, so the 
secondary at remote site will not receive the ParallelQueueRemovalMessage. 

There's a conservative simple fix:
In getAllRecipients(), we need to find the region is gone and return empty set. 
When found recipients.isEmpty(), call regionToDispatchedKeysMap.putAll(temp)

> While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in 
> remote members before stopping the AsyncEventQueue
> -
>
> Key: GEODE-2848
> URL: https://issues.apache.org/jira/browse/GEODE-2848
> Project: Geode
>  Issue Type: Bug
>  Components: lucene
>Reporter: Barry Oglesby
>
> This causes a NullPointerException in BatchRemovalThread getAllRecipients 
> like:
> {noformat}
> [fine 2017/04/24 14:27:29.163 PDT gemfire4_r02-s28_3222  
> tid=0x6b] BatchRemovalThread: ignoring exception
> java.lang.NullPointerException
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.getAllRecipients(ParallelGatewaySenderQueue.java:1776)
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1722)
> {noformat}
> This message is currently only logged at fine level and doesn't cause any 
> real issues.
> The simple fix is to check for null in getAllRecipients like:
> {noformat}
> PartitionedRegion pReg = ((PartitionedRegion) (cache.getRegion((String) pr)));
> if (pReg != null) {
>   recipients.addAll(pReg.getRegionAdvisor().adviseDataStore());
> }
> {noformat}
> Another more complex fix is to change the destroyIndex sequence.
> The current destroyIndex sequence is:
> # stops and destroys the AEQ in the initiator (including the underlying PR)
> # closes the repository manager in the initiator
> # stops and destroys the AEQ in remote members (not including the underlying 
> PR)
> # closes the repository manager in the remote members
> # destroys the fileAndChunk region in the initiator
> Between steps 1 and 3, the region will be null in the remote members, so the 
> NPE can occur.
> A better sequence would be:
> # stops the AEQ in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the initiator
> # closes the repository manager in the remote members
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the fileAndChunk region in the initiator
> That would be 3 messages between the members.
> I think that can be combined into one remote message like:
> # stops the AEQ in the initiator
> # closes the repository manager in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the remote members
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the fileAndChunk region in the initiator



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (GEODE-1988) CI failure: RegisterInterestKeysPRDUnitTest fails intermittently

2017-05-01 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou reopened GEODE-1988:
--

It is reproduced in CI FlakeyTest #569 and #564

org.apache.geode.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest
 > testParallelPropagationWithClientServer FAILED
org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest$$Lambda$32/630572366.run
 in VM 7 running on Host 56dada81-012e-4ebc-6c30-8480d4e17975 with 8 VMs
at org.apache.geode.test.dunit.VM.invoke(VM.java:377)
at org.apache.geode.test.dunit.VM.invoke(VM.java:347)
at org.apache.geode.test.dunit.VM.invoke(VM.java:292)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest.testParallelPropagationWithClientServer(ParallelWANPropagationClientServerDUnitTest.java:56)

Caused by:
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary 
discovery failed.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:191)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:570)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:805)
at 
org.apache.geode.cache.client.internal.RegisterInterestOp.execute(RegisterInterestOp.java:58)
at 
org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterest(ServerRegionProxy.java:362)
at 
org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3895)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3974)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3791)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3787)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3783)
at 
org.apache.geode.internal.cache.wan.WANTestBase.createClientWithLocator(WANTestBase.java:2126)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelWANPropagationClientServerDUnitTest.lambda$testParallelPropagationWithClientServer$998d73b4$1(ParallelWANPropagationClientServerDUnitTest.java:56)

Caused by:
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
Primary discovery failed.


> CI failure: RegisterInterestKeysPRDUnitTest fails intermittently
> 
>
> Key: GEODE-1988
> URL: https://issues.apache.org/jira/browse/GEODE-1988
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Darrel Schneider
>  Labels: ci
>
> :geode-core:distributedTest
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysPRDUnitTest 
> > testRegisterCreatesInvalidEntry FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest$$Lambda$18/601024495.run
>  in VM 3 running on Host 583dcf0e97d9 with 4 VMs
> Caused by:
> java.lang.AssertionError: failed while registering interest
> Caused by:
> org.apache.geode.cache.NoSubscriptionServersAvailableException: 
> org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary 
> discovery failed.
> Caused by:
> 
> org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary 
> discovery failed.
> 7578 tests completed, 1 failed, 588 skipped



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 58853: GEODE-2847: Get correct version tags for retried bulk operations

2017-04-28 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58853/#review173386
---


Ship it!




Ship It!

- xiaojian zhou


On April 28, 2017, 8:17 p.m., Eric Shu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58853/
> ---
> 
> (Updated April 28, 2017, 8:17 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Darrel Schneider, and Lynn 
> Gallinat.
> 
> 
> Bugs: GEODE-2847
> https://issues.apache.org/jira/browse/GEODE-2847
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Get correct version tags from recordedBulkOpVersionTags in eventTracker.
> Do not remove the recordedBulkOpVersionTags prematurely.
> Add the unit test which would fail without the fixes.
> 
> 
> Diffs
> -
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/EventTracker.java 
> 2ddfdc4 
>   geode-core/src/main/java/org/apache/geode/internal/cache/LocalRegion.java 
> 8c061b0 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/partitioned/PutAllPRMessage.java
>  27f5aa0 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/partitioned/RemoveAllPRMessage.java
>  f4f6299 
>   
> geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/ClientProxyMembershipID.java
>  2cbf63b 
>   
> geode-core/src/test/java/org/apache/geode/internal/cache/EventTrackerTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58853/diff/1/
> 
> 
> Testing
> ---
> 
> precheckin.
> 
> 
> Thanks,
> 
> Eric Shu
> 
>



[jira] [Resolved] (GEODE-2806) when batch is dispatched, if the bucket is not primary, we should still destroy the event from queue

2017-04-21 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-2806.
--
Resolution: Fixed

> when batch is dispatched, if the bucket is not primary, we should still 
> destroy the event from queue
> 
>
> Key: GEODE-2806
> URL: https://issues.apache.org/jira/browse/GEODE-2806
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
>  Labels: lucene
>
> This is one of the root causes for data mismatch. 
> When AEQ dispatched a batch, when it tried to destroy the events from queue, 
> the bucket might be no longer primary. There's no need to let the new primary 
> to re-dispatch the batch. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 58550: AEQ regions being created before the user regions

2017-04-21 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58550/#review172584
---




geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/AbstractPartitionedRepositoryManager.java
Line 57 (original), 58 (patched)
<https://reviews.apache.org/r/58550/#comment245680>

According to your code, the getRegionPath() will return null here. Is that 
by design? It looks like you purpursely hack the code to leave userRegion as 
null for a while. But actually it's not necessary. A simpler way is: you just 
specify the data region path in index, since data region not created yet, it 
will be null here.



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneRawIndex.java
Lines 28 (patched)
<https://reviews.apache.org/r/58550/#comment245681>

This is wrong. You have to move the createRepositoryManager() into this 
method. 

You need to test the RawIndex to make sure it did work after your code 
changes.



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneRegionListener.java
Lines 103 (patched)
<https://reviews.apache.org/r/58550/#comment245676>

This is the main entrance, you should add some comments here to describe 
the background.



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneRegionListener.java
Lines 104 (patched)
<https://reviews.apache.org/r/58550/#comment245677>

Why you did not specify "this.regionPath" in parameter? If so, your code 
will be much simpler. Many code could be omitted. I will point them later.



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneServiceImpl.java
Lines 199 (patched)
<https://reviews.apache.org/r/58550/#comment245678>

you can use existing createIndexRegions(indexNames, regionPath).



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneServiceImpl.java
Lines 220 (patched)
<https://reviews.apache.org/r/58550/#comment245679>

This can be omitted.


- xiaojian zhou


On April 20, 2017, 2:03 a.m., nabarun nag wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58550/
> ---
> 
> (Updated April 20, 2017, 2:03 a.m.)
> 
> 
> Review request for geode, Jason Huynh and Dan Smith.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Testing a new start up mechanism where the AEQ is created before the user 
> region. Please review and let us know if any modifications are needed, or if 
> this is a viable solution
> 
> 
> Diffs
> -
> 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/AbstractPartitionedRepositoryManager.java
>  26bb488ed 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneEventListener.java
>  0f5553343 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneIndexForPartitionedRegion.java
>  fea484547 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneIndexImpl.java
>  36f6720c3 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneIndexImplFactory.java
>  e99f3d9db 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneRawIndex.java
>  75ab5cab3 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneRegionListener.java
>  f4e2a79ef 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneServiceImpl.java
>  30952bfe2 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/LuceneEventListenerJUnitTest.java
>  79de29a09 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/LuceneIndexForPartitionedRegionTest.java
>  8e4c179a5 
> 
> 
> Diff: https://reviews.apache.org/r/58550/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> nabarun nag
> 
>



Review Request 58594: even lost primary, dispatched batch should still be removed

2017-04-20 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58594/
---

Review request for geode and Jason Huynh.


Bugs: GEODE-2806
https://issues.apache.org/jira/browse/GEODE-2806


Repository: geode


Description
---

one of the known root causes


Diffs
-

  
geode-core/src/main/java/org/apache/geode/internal/cache/wan/parallel/ParallelGatewaySenderQueue.java
 cf4c5a9ef 


Diff: https://reviews.apache.org/r/58594/diff/1/


Testing
---


Thanks,

xiaojian zhou



[jira] [Assigned] (GEODE-2806) when batch is dispatched, if the bucket is not primary, we should still destroy the event from queue

2017-04-20 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou reassigned GEODE-2806:


Assignee: xiaojian zhou

> when batch is dispatched, if the bucket is not primary, we should still 
> destroy the event from queue
> 
>
> Key: GEODE-2806
> URL: https://issues.apache.org/jira/browse/GEODE-2806
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
>  Labels: lucene
>
> This is one of the root causes for data mismatch. 
> When AEQ dispatched a batch, when it tried to destroy the events from queue, 
> the bucket might be no longer primary. There's no need to let the new primary 
> to re-dispatch the batch. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (GEODE-2806) when batch is dispatched, if the bucket is not primary, we should still destroy the event from queue

2017-04-20 Thread xiaojian zhou (JIRA)
xiaojian zhou created GEODE-2806:


 Summary: when batch is dispatched, if the bucket is not primary, 
we should still destroy the event from queue
 Key: GEODE-2806
 URL: https://issues.apache.org/jira/browse/GEODE-2806
 Project: Geode
  Issue Type: Bug
Reporter: xiaojian zhou


This is one of the root causes for data mismatch. 

When AEQ dispatched a batch, when it tried to destroy the events from queue, 
the bucket might be no longer primary. There's no need to let the new primary 
to re-dispatch the batch. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (GEODE-2787) state flush did not wait for notifyGateway

2017-04-14 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-2787.
--
Resolution: Fixed

> state flush did not wait for notifyGateway
> --
>
> Key: GEODE-2787
> URL: https://issues.apache.org/jira/browse/GEODE-2787
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
>  Labels: lucene
>
> When distribution happened, it calls startOperation() to increase a count, 
> then call an endOperation() to decrease the count. 
> state flush will wait for this count to become 0. 
> But notifyGateway() is called after distribute(). So there's race that 
> stateflush finished but notifyGateway has not done yet. 
> The fix is to move the endOperation() after callbacks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (GEODE-2787) state flush did not wait for notifyGateway

2017-04-14 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-2787:
-
Labels: lucene  (was: )

> state flush did not wait for notifyGateway
> --
>
> Key: GEODE-2787
> URL: https://issues.apache.org/jira/browse/GEODE-2787
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>  Labels: lucene
>
> When distribution happened, it calls startOperation() to increase a count, 
> then call an endOperation() to decrease the count. 
> state flush will wait for this count to become 0. 
> But notifyGateway() is called after distribute(). So there's race that 
> stateflush finished but notifyGateway has not done yet. 
> The fix is to move the endOperation() after callbacks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (GEODE-2787) state flush did not wait for notifyGateway

2017-04-14 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou reassigned GEODE-2787:


Assignee: xiaojian zhou

> state flush did not wait for notifyGateway
> --
>
> Key: GEODE-2787
> URL: https://issues.apache.org/jira/browse/GEODE-2787
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
>  Labels: lucene
>
> When distribution happened, it calls startOperation() to increase a count, 
> then call an endOperation() to decrease the count. 
> state flush will wait for this count to become 0. 
> But notifyGateway() is called after distribute(). So there's race that 
> stateflush finished but notifyGateway has not done yet. 
> The fix is to move the endOperation() after callbacks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (GEODE-2787) state flush did not wait for notifyGateway

2017-04-14 Thread xiaojian zhou (JIRA)
xiaojian zhou created GEODE-2787:


 Summary: state flush did not wait for notifyGateway
 Key: GEODE-2787
 URL: https://issues.apache.org/jira/browse/GEODE-2787
 Project: Geode
  Issue Type: Bug
Reporter: xiaojian zhou


When distribution happened, it calls startOperation() to increase a count, then 
call an endOperation() to decrease the count. 

state flush will wait for this count to become 0. 

But notifyGateway() is called after distribute(). So there's race that 
stateflush finished but notifyGateway has not done yet. 

The fix is to move the endOperation() after callbacks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Apache Geode release - v1.1.1 RC2

2017-03-31 Thread Xiaojian Zhou
+1
built from source and run several dunits (the ones showed in regression #50)

On Fri, Mar 31, 2017 at 9:04 AM, Dick Cavender  wrote:

> +1
>
> Built source and ran tests successfully on RH.
>
> Ran gfsh on binary dist to start and stop locator and system.
>
>
>
> On 3/30/2017 4:58 PM, Anthony Baker wrote:
>
>> Please review and vote (especially if you are a Geode PMC member!).
>>
>> Anthony
>>
>>
>>
>


[jira] [Commented] (GEODE-1894) SerialGatewaySenderOperationsDUnitTest test hangs

2017-03-29 Thread xiaojian zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947572#comment-15947572
 ] 

xiaojian zhou commented on GEODE-1894:
--

It was not reproducible but we found the root cause and fixed it and committed 
at revision 1938b386f1ed906452. 

> SerialGatewaySenderOperationsDUnitTest test hangs
> -
>
> Key: GEODE-1894
> URL: https://issues.apache.org/jira/browse/GEODE-1894
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Hitesh Khamesra
> Fix For: 1.0.0-incubating
>
> Attachments: threaddump.txt
>
>
> test tries to stop Serial Gateway Sender and that thread just hangs. Event 
> processors are waiting to become primary. One AckReader thread waiting for 
> ack. Seems like need to interrupt these threads. Attached thread dump 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (GEODE-1894) SerialGatewaySenderOperationsDUnitTest test hangs

2017-03-29 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-1894:
-
Comment: was deleted

(was: Re-run the test using the current revision and even Sep 12's revision, no 
reproduce so far. )

> SerialGatewaySenderOperationsDUnitTest test hangs
> -
>
> Key: GEODE-1894
> URL: https://issues.apache.org/jira/browse/GEODE-1894
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Hitesh Khamesra
> Fix For: 1.0.0-incubating
>
> Attachments: threaddump.txt
>
>
> test tries to stop Serial Gateway Sender and that thread just hangs. Event 
> processors are waiting to become primary. One AckReader thread waiting for 
> ack. Seems like need to interrupt these threads. Attached thread dump 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (GEODE-2683) Lucene query did not match region values

2017-03-17 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou resolved GEODE-2683.
--
Resolution: Fixed

> Lucene query did not match region values
> 
>
> Key: GEODE-2683
> URL: https://issues.apache.org/jira/browse/GEODE-2683
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
> Fix For: 1.2.0
>
>
> There're several root causes. This one is due to the fix in #45782 changed 
> the order to notify primary bucket's gateway before distribute to secondary. 
> The log is at /export/buglogs_bvt/xzhou/lucene/concParRegHA-0209-235804
> CLIENT vm_1_thr_17_dataStore1_ip-10-32-108-36_11189
> TASK[1] parReg.ParRegTest.HydraTask_HADoEntryOps
> ERROR util.TestException: util.TestException: Lucene query did not match 
> region values. missingKeys=[], extraKeys=[Object_13, Object_17, Object_952, 
> Object_550, Object_1876, Object_2732, Object_270, Object_4722, Object_4726, 
> Object_2537]
> at lucene.LuceneHelper.verifyLuceneIndex(LuceneHelper.java:88)
> at lucene.LuceneTest.verifyLuceneIndex(LuceneTest.java:128)
> at lucene.LuceneTest.verifyFromSnapshotOnly(LuceneTest.java:79)
> at parReg.ParRegTest.verifyFromSnapshot(ParRegTest.java:5638)
> at parReg.ParRegTest.concVerify(ParRegTest.java:6035)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at util.MethodCoordinator.executeOnce(MethodCoordinator.java:68)
> at parReg.ParRegTest.HADoEntryOps(ParRegTest.java:2273)
> at parReg.ParRegTest.HydraTask_HADoEntryOps(ParRegTest.java:1032)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> The root cause is:
> T1: A putAll (or removeAll. operation arrived at primary bucket at memberA
> T2: BR.virtualPut() called handleWANEvent() and create shadow key
> T3: PutAll will invoke callback (i.e. write into AEQ) before distribution. 
> (Put/Destroy will not have this problem because they distribute before 
> callback)
> T4: handleSuccessfulBatchDispatch will send ParallelQueueRemovalMessage to 
> the secondary bucket at memberB
> T5: memberB has dataRegion's secondary bucket, but brq is not created yet 
> (due to rebalance). So in ParallelQueueRemovalMessage.process(), it will only 
> try to remove the event from tempQueue (which does not contain the event, so 
> it will do nothing)
> T6: Now, finally the BR.virtualPut()'s distribution arrived at user region's 
> secondary bucket at memberB. It will be added into the AEQ (or tempQueue, 
> depends). 
> T7: memberB becomes new primary (due to rebalance) and re-dispatch the shadow 
> key (which has been processed much earlier in memberA). Data mismatch is 
> because the replayed event overrides a newer event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (GEODE-2683) Lucene query did not match region values

2017-03-17 Thread xiaojian zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-2683:
-
Description: 
There're several root causes. This one is due to the fix in #45782 changed the 
order to notify primary bucket's gateway before distribute to secondary. 

The log is at /export/buglogs_bvt/xzhou/lucene/concParRegHA-0209-235804
CLIENT vm_1_thr_17_dataStore1_ip-10-32-108-36_11189
TASK[1] parReg.ParRegTest.HydraTask_HADoEntryOps
ERROR util.TestException: util.TestException: Lucene query did not match region 
values. missingKeys=[], extraKeys=[Object_13, Object_17, Object_952, 
Object_550, Object_1876, Object_2732, Object_270, Object_4722, Object_4726, 
Object_2537]
at lucene.LuceneHelper.verifyLuceneIndex(LuceneHelper.java:88)
at lucene.LuceneTest.verifyLuceneIndex(LuceneTest.java:128)
at lucene.LuceneTest.verifyFromSnapshotOnly(LuceneTest.java:79)
at parReg.ParRegTest.verifyFromSnapshot(ParRegTest.java:5638)
at parReg.ParRegTest.concVerify(ParRegTest.java:6035)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at util.MethodCoordinator.executeOnce(MethodCoordinator.java:68)
at parReg.ParRegTest.HADoEntryOps(ParRegTest.java:2273)
at parReg.ParRegTest.HydraTask_HADoEntryOps(ParRegTest.java:1032)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)

The root cause is:
T1: A putAll (or removeAll. operation arrived at primary bucket at memberA
T2: BR.virtualPut() called handleWANEvent() and create shadow key
T3: PutAll will invoke callback (i.e. write into AEQ) before distribution. 
(Put/Destroy will not have this problem because they distribute before callback)
T4: handleSuccessfulBatchDispatch will send ParallelQueueRemovalMessage to the 
secondary bucket at memberB
T5: memberB has dataRegion's secondary bucket, but brq is not created yet (due 
to rebalance). So in ParallelQueueRemovalMessage.process(), it will only try to 
remove the event from tempQueue (which does not contain the event, so it will 
do nothing)
T6: Now, finally the BR.virtualPut()'s distribution arrived at user region's 
secondary bucket at memberB. It will be added into the AEQ (or tempQueue, 
depends). 
T7: memberB becomes new primary (due to rebalance) and re-dispatch the shadow 
key (which has been processed much earlier in memberA). Data mismatch is 
because the replayed event overrides a newer event.

  was:There're several root causes. This one is due to the fix in #45782 
changed the order to notify primary bucket's gateway before distribute to 
secondary. 


> Lucene query did not match region values
> 
>
> Key: GEODE-2683
> URL: https://issues.apache.org/jira/browse/GEODE-2683
> Project: Geode
>  Issue Type: Bug
>    Reporter: xiaojian zhou
>    Assignee: xiaojian zhou
> Fix For: 1.2.0
>
>
> There're several root causes. This one is due to the fix in #45782 changed 
> the order to notify primary bucket's gateway before distribute to secondary. 
> The log is at /export/buglogs_bvt/xzhou/lucene/concParRegHA-0209-235804
> CLIENT vm_1_thr_17_dataStore1_ip-10-32-108-36_11189
> TASK[1] parReg.ParRegTest.HydraTask_HADoEntryOps
> ERROR util.TestException: util.TestException: Lucene query did not match 
> region values. missingKeys=[], extraKeys=[Object_13, Object_17, Object_952, 
> Object_550, Object_1876, Object_2732, Object_270, Object_4722, Object_4726, 
> Object_2537]
> at lucene.LuceneHelper.verifyLuceneIndex(LuceneHelper.java:88)
> at lucene.LuceneTest.verifyLuceneIndex(LuceneTest.java:128)
> at lucene.LuceneTest.verifyFromSnapshotOnly(LuceneTest.java:79)
> at parReg.ParRegTest.verifyFromSnapshot(ParRegTest.java:5638)
> at parReg.ParRegTest.concVerify(ParRegTest.java:6035)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at util.MethodCoordinator.executeOnce(MethodCoordinator.java:68)
> at parReg.ParRegTest.HADoEntryOps(ParRegTest.java:2273)
> at parReg.ParRegTest.HydraTask_HADoEntryOps(ParRegTest.java:1032)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Re: Review Request 57483: GEODE-2643:Combine chunk and file region into a single region

2017-03-10 Thread xiaojian zhou

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57483/#review168661
---




geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneIndexForPartitionedRegion.java
Line 85 (original), 82 (patched)
<https://reviews.apache.org/r/57483/#comment240912>

mix using "fileRegion" and "fileAndChunkRegion" in many places.



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/filesystem/FileSystem.java
Line 58 (original), 55 (patched)
<https://reviews.apache.org/r/57483/#comment240910>

fileRegion is still Map<String, File>? But in line 41, it is "Map 
fileAndChunkRegion".



geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/filesystem/FileSystem.java
Line 184 (original), 180 (patched)
<https://reviews.apache.org/r/57483/#comment240918>

Looks like fileAndChunkRegion is not <String, File>. So where is the chunk? 
Where is the "File"?


- xiaojian zhou


On March 10, 2017, 12:26 a.m., Jason Huynh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57483/
> ---
> 
> (Updated March 10, 2017, 12:26 a.m.)
> 
> 
> Review request for geode, Barry Oglesby, Lynn Hughes-Godfrey, nabarun nag, 
> Dan Smith, and xiaojian zhou.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> * removed file and chunk count from stat (no longer able to use localSize and 
> not sure how helpful those stats really were)
> * removed tests that were doing checks against bucketRegions
> * removed a test that was hard to maintain in FileSystemJUnitTest (it 
> required enough operations to 
> * todo before checkin: rename fileBucket variables to fileAndChunkBucket OR 
> rename fileAndChunkRegion to fileRegion.  The Suffix of .files is still being 
> used.
> 
> 
> Diffs
> -
> 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/IndexRepositoryFactory.java
>  7e685b7 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneBucketListener.java
>  9532249 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/LuceneIndexForPartitionedRegion.java
>  4aa24b5 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/directory/RegionDirectory.java
>  18428ec 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/filesystem/FileSystem.java
>  660816d 
>   
> geode-lucene/src/main/java/org/apache/geode/cache/lucene/internal/filesystem/FileSystemStats.java
>  85ae6d7 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/LuceneIndexDestroyDUnitTest.java
>  6260075 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/LuceneIndexMaintenanceIntegrationTest.java
>  5ac01b8 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/LuceneQueriesPersistenceIntegrationTest.java
>  5fe2df5 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/LuceneIndexForPartitionedRegionTest.java
>  93cc0a8 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/LuceneIndexRecoveryHAIntegrationTest.java
>  b50db98 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/PartitionedRepositoryManagerJUnitTest.java
>  9c603c7 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/directory/RegionDirectoryJUnitTest.java
>  32249e4 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/distributed/DistributedScoringJUnitTest.java
>  6062904 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/filesystem/FileSystemJUnitTest.java
>  ee41e40 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/repository/IndexRepositoryImplJUnitTest.java
>  42cc2bc 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/internal/repository/IndexRepositoryImplPerformanceTest.java
>  e3e2787 
>   
> geode-lucene/src/test/java/org/apache/geode/cache/lucene/test/LuceneTestUtilities.java
>  329dee9 
> 
> 
> Diff: https://reviews.apache.org/r/57483/diff/1/
> 
> 
> Testing
> ---
> 
> geode-lucene:precheckin
> 
> 
> Thanks,
> 
> Jason Huynh
> 
>



  1   2   >