[VOTE] Apache Geode 1.15.1.RC1

2022-09-29 Thread Mario Kevo
Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.15.1.RC1.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you performed.

Voting deadline:
3PM PST Tue, Oct 04 2022.

Please note that we are voting upon the source tag:
rel/v1.15.1.RC1

Release notes:
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.15.1

Source and binary distributions:
https://dist.apache.org/repos/dist/dev/geode/1.15.1.RC1/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachegeode-1139

GitHub:
https://github.com/apache/geode/tree/rel/v1.15.1.RC1
https://github.com/apache/geode-examples/tree/rel/v1.15.1.RC1
https://github.com/apache/geode-native/tree/rel/v1.15.1.RC1
https://github.com/apache/geode-benchmarks/tree/rel/v1.15.1.RC1

Pipelines:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-15-main
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-15-rc

Geode's KEYS file containing PGP keys we use to sign the release:
https://github.com/apache/geode/blob/develop/KEYS

Command to run geode-examples:
./gradlew 
-PgeodeReleaseUrl=https://dist.apache.org/repos/dist/dev/geode/1.15.1.RC1 
-PgeodeRepositoryUrl=https://repository.apache.org/content/repositories/orgapachegeode-1139
 build runAll

Regards
Mario Kevo



Odg: Release manager permissions

2022-09-27 Thread Mario Kevo
Hi Anthony,

I have a question regarding CI.
I saw the file in the geode repo with some values for machines on which tests 
are executed. 
https://github.com/apache/geode/blob/e2ac1113f8f6819095785be556bef8e080ab6988/ci/pipelines/shared/jinja.variables.yml#L92

So I have a question, what is the setup now that it is used on Concourse CI?
Do you have one or more machines(and how many) so tests can be executed in 
parallel? How many CPUs, RAM and disk sizes are used for them?

Thanks and BR,
Mario

Šalje: Alberto Gomez 
Poslano: 27. rujna 2022. 13:12
Prima: dev@geode.apache.org 
Predmet: Re: Release manager permissions

Hi,

Do you know if any company has offered to sponsor the CI pipelines? What would 
it take for such a company besides paying the bills? Would a migration be 
needed?

Regarding the old ASF Jenkins jobs, my understanding is that they would offer 
the same CI functionality as we have today, but they would be run on ASF 
provided resources which would most likely make the time to get results longer 
and less predictable. Is that correct?


Thanks,

Alberto

From: Anthony Baker 
Sent: Friday, September 23, 2022 8:15 PM
To: dev@geode.apache.org 
Subject: Re: Release manager permissions

Just a reminder to all: we need to find an alternative to the VMware-sponsored 
CI pipelines currently in use. Any ideas? Should we try to resurrect the old 
ASF Jenkins jobs?

Anthony

> On Sep 23, 2022, at 3:26 AM, Mario Kevo  wrote:
>
> ⚠ External Email
>
> Hi devs,
>
> I need the following permissions for the release manager:
>
>  *   bulk modification permission on Apache Geode JIRA
>  *   permission to deploy pipelines to Geode CI
>  *   Docker Hub credentials with permission to upload Apache Geode to Docker 
> Hub
>
> username: mkevo
> mail: mk...@apache.org
>
> Can someone give me these permissions, so I can start building a new patch 
> release?
>
> Thanks and BR,
> Mario
>
> 
>
> ⚠ External Email: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender.



Odg: Release manager permissions

2022-09-26 Thread Mario Kevo
Hi Anthony,

I still don't have permission for Concourse. I logged in with a GitHub account 
and saw that I'm not authorized to run jobs.
Username: mkevo

The username for Docker Hub is the same: mkevo.

Thanks,
Mario


Šalje: Anthony Baker 
Poslano: 23. rujna 2022. 20:15
Prima: dev@geode.apache.org 
Predmet: Re: Release manager permissions

Just a reminder to all: we need to find an alternative to the VMware-sponsored 
CI pipelines currently in use. Any ideas? Should we try to resurrect the old 
ASF Jenkins jobs?

Anthony

> On Sep 23, 2022, at 3:26 AM, Mario Kevo  wrote:
>
> ⚠ External Email
>
> Hi devs,
>
> I need the following permissions for the release manager:
>
>  *   bulk modification permission on Apache Geode JIRA
>  *   permission to deploy pipelines to Geode CI
>  *   Docker Hub credentials with permission to upload Apache Geode to Docker 
> Hub
>
> username: mkevo
> mail: mk...@apache.org
>
> Can someone give me these permissions, so I can start building a new patch 
> release?
>
> Thanks and BR,
> Mario
>
> 
>
> ⚠ External Email: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender.



Release manager permissions

2022-09-23 Thread Mario Kevo
Hi devs,

I need the following permissions for the release manager:

  *   bulk modification permission on Apache Geode JIRA
  *   permission to deploy pipelines to Geode CI
  *   Docker Hub credentials with permission to upload Apache Geode to Docker 
Hub

username: mkevo
mail: mk...@apache.org

Can someone give me these permissions, so I can start building a new patch 
release?

Thanks and BR,
Mario


Apache Geode 1.15.1 patch version

2022-09-09 Thread Mario Kevo
Hi all,

I'm going to build a new patch version of the Geode.
There is a list of tasks that are declared to be fixed in 1.15.1. As they are 
already assigned, please can the assignee provide a fix for this so we can move 
on? https://issues.apache.org/jira/projects/GEODE/versions/12351801

Also, there is one blocker that will be good to introduce to this release, if 
it is okay for all of you. https://issues.apache.org/jira/browse/GEODE-10415

Please suggest if you have some more tickets that are critical and should be 
backported to this release, so we can get an opinion of the community on that 
before releasing the new version.

Thanks and BR,
Mario



Odg: Release manager for 1.15.1?

2022-09-07 Thread Mario Kevo
Hi Anthony,

I want to volunteer as 1.15.1 release manager.
Can I get some support on what should be done in this patch release and how to 
proceed?

Thanks and Br,
Mario

Šalje: Anthony Baker 
Poslano: 2. rujna 2022. 18:41
Prima: u...@geode.apache.org 
Kopija: dev@geode.apache.org 
Predmet: Re: Release manager for 1.15.1?

Note that we will want to bump some dependency versions in this patch release. 
If you would like to contribute this change, instructions can be found at 
https://github.com/apache/geode/tree/develop/dev-tools/dependencies.

Anthony

On Sep 2, 2022, at 9:27 AM, Anthony Baker 
mailto:bak...@vmware.com>> wrote:

Hi all,

We need to ship a new Geode 1.15.1 patch to fix a few issues. Part of the Geode 
and ASF release process is identify a Release Manager to collect the changes, 
prepare a release candidate, and publish the release binaries. The good news 
is, we have a well-vetted set of scripts and instructions [1] to automate this 
and make it super easy for first timers to go through the process.

Anyone want to volunteer? We would like to start this process next week (Sept 
6).

Thanks,
Anthony

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FReleasing%2BApache%2BGeodedata=05%7C01%7Cbakera%40vmware.com%7Cff33d85395934f85e53108da8d001cb0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637977328931761268%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=azcyfr1oOxXZ%2Fus5LdXWcAqiiPjIfG8bjACpuB69ZjY%3Dreserved=0




Odg: Release manager for 1.15.1?

2022-09-07 Thread Mario Kevo
Hi all,

I volunteer for 1.15.1 Release manager if it is okay for you.

BR,
Mario

Šalje: rup...@webstersystems.co.uk 
Poslano: 7. rujna 2022. 11:14
Prima: u...@geode.apache.org ; dev@geode.apache.org 

Predmet: RE: Release manager for 1.15.1?

No, but I think I committed a change to the wiki once 
Perhaps we can arrange a time for a zoom or teams call?

KR rupert

-Original Message-
From: Anthony Baker 
Sent: 06 September 2022 20:08
To: dev@geode.apache.org
Cc: u...@geode.apache.org
Subject: Re: Release manager for 1.15.1?

Thanks Rupert!  Since Rupert is not a committer, is anyone available to help 
him learn the ropes?

Anthony


> On Sep 6, 2022, at 11:51 AM, rup...@webstersystems.co.uk wrote:
>
> ⚠ External Email
>
> Hi Anthony,
>
>
>
> I’m interested to help out on the Geode 1.15.1 Release Manager.
>
> I can make time later this week 
>
>
>
> Thanks, kind regards,
>
> Rupert
>
>
>
> Webster Systems Ltd
>
> Kingsdon Nursery Garden
>
> Kingsdon, Somerton, Somerset, TA11 7LE, UK
>
> Tel: 07740 289100
>
> 
>  
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.webstersystems.co.uk%2Fdata=05%7C01%7Cbakera%40vmware.com%7C057ce4514a024a2f93bb08da90391317%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637980872145197027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=f4%2BIdvol%2BolysK5XYg0EWVvrJWkVZ2wa6v83ltlIB04%3Dreserved=0
>
> UK Registered No: 6517977
>
>
>
> -Original Message-
>
> From: Anthony Baker <  bak...@vmware.com>
>
> Sent: 02 September 2022 17:28
>
> To:   dev@geode.apache.org
>
> Cc:   u...@geode.apache.org
>
> Subject: Release manager for 1.15.1?
>
>
>
> This email has reached the company via an external source.
>
>
>
> Please be cautious opening any attachments or links.
>
>
>
>
>
> Hi all,
>
>
>
> We need to ship a new Geode 1.15.1 patch to fix a few issues. Part of the 
> Geode and ASF release process is identify a Release Manager to collect the 
> changes, prepare a release candidate, and publish the release binaries. The 
> good news is, we have a well-vetted set of scripts and instructions [1] to 
> automate this and make it super easy for first timers to go through the 
> process.
>
>
>
> Anyone want to volunteer? We would like to start this process next week (Sept 
> 6).
>
>
>
> Thanks,
>
> Anthony
>
>
>
> [1]  
> 
>  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FReleasing%2BApache%2BGeodedata=05%7C01%7Cbakera%40vmware.com%7C057ce4514a024a2f93bb08da90391317%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637980872145197027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Zre7FG6OaAJqLyl8sP%2FraMR0ammRDnjarFRMYOXeiVg%3Dreserved=0
>
>
>
>
> 
>
> ⚠ External Email: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender.




Odg: Question about INDEX_THRESHOLD_SIZE

2022-06-07 Thread Mario Kevo

The PR and questions surrounding it, aren’t the cause of what you are seeing 
now.



I’ll jump into debugging this a bit when I get the chance – haven’t had a 
chance to sit down with a debugger just yet.



Regards,

-Jason




From: Mario Kevo 
Date: Monday, March 14, 2022 at 8:17 AM
To: dev@geode.apache.org 
Subject: Odg: Question about INDEX_THRESHOLD_SIZE
Hi,

Regarding Anil's answer:
The test will pass as we have set INDEX_THRESHOLD_SIZE to 1000 (that is how 
many entries are in the region). If you deleted that line in startServer it 
will fail asit used default value of indexThresholdSize(100).

Regarding Jason's answers:
Only change what my PR introduced to get availability to change 
INDEX_THRESHOLD_SIZE by the user as before it is hardcoded to 100, and nothing 
can change it.
The intermediate result will have all entries on which index can be applied and 
if there are more entries that have that attribute on which index is created it 
will store them all in the results(if INDEX_THRESHOLD_SIZE is changed). By 
default, it has a limit of 100 entries so it can happen that an entry that 
matches the condition will not be in the results.
Regarding this applyCondition in CompactRangeIndex.addToResultsFromEntries, it 
is never applied as iterOps is always null, as it is set in 
AbstractGroupOrRangeJunction.auxFilterEvaluate:

filterResults = filter.filterEvaluate(context,
!isConditioningNeeded ? intermediateResults : null, completeExpansion,
null/*
* Asif * Asif :The iter operands passed are null, as a not null value can 
exists only
* if there exists a single Filter operand in original GroupJunction
*/, indpndntItr, _operator == LITERAL_and, isConditioningNeeded,
false /* do not evaluate projection */);
​BR,
Mario

Šalje: Jason Huynh 
Poslano: 11. ožujka 2022. 22:06
Prima: dev@geode.apache.org 
Predmet: Re: Question about INDEX_THRESHOLD_SIZE

As an fyi, in the past we disabled applying limits at the index level for range 
indexes.

I’m surprised in this case that we would add all the entries to the 
intermediate results instead of applying the filter first and checking the 
condition before adding to the intermediate results..

I would have thought it would have to apply the condition as seen in 
CompactRangeIndex.addToResultsFromEntries :

  if (ok && runtimeItr != null && iterOps != null) {
  ok = QueryUtils.applyCondition(iterOps, context);
}

I haven’t walked this query through the code, perhaps it’s hitting a different 
index type (I’d think a map index but that probably is backed by 
CompactRangeIndexes for each key…)



From: Jason Huynh 
Date: Friday, March 11, 2022 at 12:47 PM
To: dev@geode.apache.org 
Subject: Re: Question about INDEX_THRESHOLD_SIZE
I think 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F7010data=04%7C01%7Cjhuynh%40vmware.com%7Cfc3f9301ac074a3e694808da06b7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637829714375501933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FZRbcTiJzvRXrM9TXN3hTcsS%2BXr51ZAXfiuRu4FNpxo%3Dreserved=0
 may have changed what that property represented.  I believe it was some 
arbitrary threshold to abort using index look ups (if the intermediate results 
were small, it was deemed faster to just iterate through and not do a lookup – 
at least from my interpretation of the code)
It looks like with the change, it now munges it with limit.. so now limit is 
applied to that value.. gfsh happens to always pass in a limit too, so there is 
possibly additional confusion

From the diff there is also one spot where a limit != -1 had not been added..  
In CompactRangeIndex line 489:

  if (limit < indexThresholdSize) {
limit = indexThresholdSize;
  }

This might be affecting the usage of limit at the index level?


From: Anilkumar Gingade 
Date: Friday, March 11, 2022 at 12:11 PM
To: dev@geode.apache.org 
Subject: Re: Question about INDEX_THRESHOLD_SIZE
Mario,

There is similar test/example added by you in QueryWithRangeIndexDUnitTest. 
testQueryWithWildcardAndIndexOnAttributeFromHashMap()
When I run that test (on develop); I see the results as expected:
*
Command result for :
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 85.1964 ms; indexesUsed(1):IdIndex(Results: 
1)

Are you running your test with any additional change like as you are saying :
>> I was working on allowing INDEX_THRESHOLD_SIZE System property to override 
>> CompiledValue.RESULT_LIMIT.

If so , you need to look at the change and see why its impacting that way.
If not, please let me know what change can be made in that test to reproduce 
the issue you are seeing; that will help to debug/analyze the issue.

-Anil.




On 3/11/22, 12:18 AM, "Mario Kevo"  wrote:

Hi,

It works without an i

Waiting review for a few PRs

2022-03-23 Thread Mario Kevo
Hi devs,

I have a couple of  PR's that are waiting for a review from a few code owners:

GEODE-10055: 
AbstractLauncher print info and debug with stderr instead of stdout 
(#7368)
GEODE-9969: The region name 
starting with underscore lead to missing disk store after restart 
(#7320)
GEODE-7875: The gfsh create 
index command sometimes fails with 'Index already exists. Create failed due to 
duplicate name.' message (#7195)
- There is already a discussion on the dev list, and 
proposed solutions don't work. Should be agreed how the index should work in 
this scenario.
GEODE-9101: Attribute 
VisibleNodes in MemberMXBean is incorrect 
(#6225)

Can someone take a look at this?

BR,
Mario




Odg: Question about INDEX_THRESHOLD_SIZE

2022-03-14 Thread Mario Kevo
Hi,

Regarding Anil's answer:
The test will pass as we have set INDEX_THRESHOLD_SIZE to 1000 (that is how 
many entries are in the region). If you deleted that line in startServer it 
will fail asit used default value of indexThresholdSize(100).

Regarding Jason's answers:
Only change what my PR introduced to get availability to change 
INDEX_THRESHOLD_SIZE by the user as before it is hardcoded to 100, and nothing 
can change it.
The intermediate result will have all entries on which index can be applied and 
if there are more entries that have that attribute on which index is created it 
will store them all in the results(if INDEX_THRESHOLD_SIZE is changed). By 
default, it has a limit of 100 entries so it can happen that an entry that 
matches the condition will not be in the results.
Regarding this applyCondition in CompactRangeIndex.addToResultsFromEntries, it 
is never applied as iterOps is always null, as it is set in 
AbstractGroupOrRangeJunction.auxFilterEvaluate:

filterResults = filter.filterEvaluate(context,
!isConditioningNeeded ? intermediateResults : null, completeExpansion,
null/*
* Asif * Asif :The iter operands passed are null, as a not null value can 
exists only
* if there exists a single Filter operand in original GroupJunction
*/, indpndntItr, _operator == LITERAL_and, isConditioningNeeded,
false /* do not evaluate projection */);
​BR,
Mario

Šalje: Jason Huynh 
Poslano: 11. ožujka 2022. 22:06
Prima: dev@geode.apache.org 
Predmet: Re: Question about INDEX_THRESHOLD_SIZE

As an fyi, in the past we disabled applying limits at the index level for range 
indexes.

I’m surprised in this case that we would add all the entries to the 
intermediate results instead of applying the filter first and checking the 
condition before adding to the intermediate results..

I would have thought it would have to apply the condition as seen in 
CompactRangeIndex.addToResultsFromEntries :

  if (ok && runtimeItr != null && iterOps != null) {
  ok = QueryUtils.applyCondition(iterOps, context);
}

I haven’t walked this query through the code, perhaps it’s hitting a different 
index type (I’d think a map index but that probably is backed by 
CompactRangeIndexes for each key…)



From: Jason Huynh 
Date: Friday, March 11, 2022 at 12:47 PM
To: dev@geode.apache.org 
Subject: Re: Question about INDEX_THRESHOLD_SIZE
I think https://github.com/apache/geode/pull/7010 may have changed what that 
property represented.  I believe it was some arbitrary threshold to abort using 
index look ups (if the intermediate results were small, it was deemed faster to 
just iterate through and not do a lookup – at least from my interpretation of 
the code)
It looks like with the change, it now munges it with limit.. so now limit is 
applied to that value.. gfsh happens to always pass in a limit too, so there is 
possibly additional confusion

From the diff there is also one spot where a limit != -1 had not been added..  
In CompactRangeIndex line 489:

  if (limit < indexThresholdSize) {
limit = indexThresholdSize;
  }

This might be affecting the usage of limit at the index level?


From: Anilkumar Gingade 
Date: Friday, March 11, 2022 at 12:11 PM
To: dev@geode.apache.org 
Subject: Re: Question about INDEX_THRESHOLD_SIZE
Mario,

There is similar test/example added by you in QueryWithRangeIndexDUnitTest. 
testQueryWithWildcardAndIndexOnAttributeFromHashMap()
When I run that test (on develop); I see the results as expected:
*
Command result for :
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 85.1964 ms; indexesUsed(1):IdIndex(Results: 
1)

Are you running your test with any additional change like as you are saying :
>> I was working on allowing INDEX_THRESHOLD_SIZE System property to override 
>> CompiledValue.RESULT_LIMIT.

If so , you need to look at the change and see why its impacting that way.
If not, please let me know what change can be made in that test to reproduce 
the issue you are seeing; that will help to debug/analyze the issue.

-Anil.




On 3/11/22, 12:18 AM, "Mario Kevo"  wrote:

Hi,

It works without an index but it doesn't work with an index.
When I revert changes, it takes INDEX_THRESHOLD_SIZE default value(100). 
And if the entry that matches the condition is not in that resultset it will 
not be printed.
Without index:
​gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 11.502283 ms; indexesUsed(0)

key | value
--- | 
--

Odg: Question about INDEX_THRESHOLD_SIZE

2022-03-11 Thread Mario Kevo
Hi,

It works without an index but it doesn't work with an index.
When I revert changes, it takes INDEX_THRESHOLD_SIZE default value(100). And if 
the entry that matches the condition is not in that resultset it will not be 
printed.
Without index:
​gfsh>query --query="SELECT e.key, e.value from /example-region.entrySet 
e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 11.502283 ms; indexesUsed(0)

key | value
--- | 

300 | 
{"ID":300,"indexKey":0,"pkid":"300","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":300,..
​With index:
gfsh>query --query="SELECT e.key, e.value from /example-region.entrySet 
e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 0
Query Trace : Query Executed in 8.784831 ms; indexesUsed(1):index1(Results: 100)
​BR,
Mario

Šalje: Anilkumar Gingade 
Poslano: 10. ožujka 2022. 23:16
Prima: dev@geode.apache.org 
Predmet: Re: Question about INDEX_THRESHOLD_SIZE

Mario,

There are few changes happened around this area as part of GEODE-9632 fix; can 
you please revert that change and see if the query works both with and without 
index.
Looking at the code; it seems to restrict the number index look up that needs 
to be performed; certain latency/throughput sensitive queries that or not 
expecting exact result may use this (my guess) but by default it should not be 
resulting in unexpected results.

-Anil.


On 3/10/22, 6:50 AM, "Mario Kevo"  wrote:

Hi geode-dev,

Some time ago I was working on allowing INDEX_THRESHOLD_SIZE System 
property to override CompiledValue.RESULT_LIMIT.
After this change, adding this attribute will take into a count if you set 
it.
But I need some clarification of this INDEX_THRESHOLD_SIZE attribute. Why 
is this set by default to 100?
The main problem with this attribute is that if you want to get the correct 
result, you need to know how many entries will be in the region while starting 
servers and set it on that value or higher. Sometimes it is too hard to know 
how many entries will be in the region, so maybe better will be to set it by 
default to some higher number, something like Integer.MAX_VALUE.

Where this attribute is used?
It is used to get index results while doing queries.

What is the problem?
If we have INDEX_THRESHOLD_SIZE set to 500, and we have 1k entries it can 
happen that while doing a query it will get only 500 entries and where clause 
cannot be fulfilled and we got no results.
Let's see it by an example!

We have only one entry that matches the condition from the query, 
INDEX_THRESHOLD_SIZE set to 500, and 1k entries in the region.
If we run the query without an index we got the result.
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 10.750238 ms; indexesUsed(0)

key | value
--- | 

700 | 
{"ID":700,"indexKey":0,"pkid":"700","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":700,..
​If we create an index and then run again this query there is no result.
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 0
Query Trace : Query Executed in 22.079016 ms; 
indexesUsed(1):index1(Results: 500)
​This happened because we have no luck getting that entry that matches the 
condition in the intermediate results for the index.
So the questions are:
What if more entries enter the region that will make the index return more 
entries than this threshold sets? Then we're again in jeopardy that the query 
condition will not match.
Why is this attribute set by default to 100?
Can we change it to the Integer.MAX_VALUE by default to be sure that we 
have the correct result? What are the consequences?

BR,
Mario




Question about INDEX_THRESHOLD_SIZE

2022-03-10 Thread Mario Kevo
Hi geode-dev,

Some time ago I was working on allowing INDEX_THRESHOLD_SIZE System property to 
override CompiledValue.RESULT_LIMIT.
After this change, adding this attribute will take into a count if you set it.
But I need some clarification of this INDEX_THRESHOLD_SIZE attribute. Why is 
this set by default to 100?
The main problem with this attribute is that if you want to get the correct 
result, you need to know how many entries will be in the region while starting 
servers and set it on that value or higher. Sometimes it is too hard to know 
how many entries will be in the region, so maybe better will be to set it by 
default to some higher number, something like Integer.MAX_VALUE.

Where this attribute is used?
It is used to get index results while doing queries.

What is the problem?
If we have INDEX_THRESHOLD_SIZE set to 500, and we have 1k entries it can 
happen that while doing a query it will get only 500 entries and where clause 
cannot be fulfilled and we got no results.
Let's see it by an example!

We have only one entry that matches the condition from the query, 
INDEX_THRESHOLD_SIZE set to 500, and 1k entries in the region.
If we run the query without an index we got the result.
gfsh>query --query="SELECT e.key, e.value from /example-region.entrySet 
e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 10.750238 ms; indexesUsed(0)

key | value
--- | 

700 | 
{"ID":700,"indexKey":0,"pkid":"700","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":700,..
​If we create an index and then run again this query there is no result.
gfsh>query --query="SELECT e.key, e.value from /example-region.entrySet 
e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 0
Query Trace : Query Executed in 22.079016 ms; indexesUsed(1):index1(Results: 
500)
​This happened because we have no luck getting that entry that matches the 
condition in the intermediate results for the index.
So the questions are:
What if more entries enter the region that will make the index return more 
entries than this threshold sets? Then we're again in jeopardy that the query 
condition will not match.
Why is this attribute set by default to 100?
Can we change it to the Integer.MAX_VALUE by default to be sure that we have 
the correct result? What are the consequences?

BR,
Mario



Odg: RFC Introducing VisibleMembers attribute

2022-03-07 Thread Mario Kevo
Hi geode-dev,

Just info that the deadline is passed and if you don't have any comments I will 
start with the implementation.
If you have any comments, please add them to the RFC.

BR,
Mario

Šalje: Mario Kevo 
Poslano: 15. veljače 2022. 9:41
Prima: dev@geode.apache.org 
Predmet: RFC Introducing VisibleMembers attribute

Hi geode-dev,

We published a new RFC:

https://cwiki.apache.org/confluence/display/GEODE/Introducing+VisibleMembers+attribute

If you have any questions and concerns, please add them to the RFC.

Thank you.

BR,
Mario


RFC Introducing VisibleMembers attribute

2022-02-15 Thread Mario Kevo
Hi geode-dev,

We published a new RFC:

https://cwiki.apache.org/confluence/display/GEODE/Introducing+VisibleMembers+attribute

If you have any questions and concerns, please add them to the RFC.

Thank you.

BR,
Mario


Re: CWiki permissions

2022-02-11 Thread Mario Kevo
Hi,

It is solved.

BR,
Mario

From: Mario Kevo 
Sent: Friday, February 11, 2022 3:31:59 PM
To: dev@geode.apache.org 
Subject: CWiki permissions

Hi,

I thought that I have permission for writing an RFC, but it seems that I was 
wrong.
Please, can you give me permission?
My username: mkevo

BR,
Mario


CWiki permissions

2022-02-11 Thread Mario Kevo
Hi,

I thought that I have permission for writing an RFC, but it seems that I was 
wrong.
Please, can you give me permission?
My username: mkevo

BR,
Mario


Odg: Creating index failed

2022-02-08 Thread Mario Kevo
Hi Anil,

I agree that it can happen that two threads try to create the index with the 
same name with different index expressions concurrently.
This distributed lock will help to avoid this issue, but the same test will 
fail.
The first issue is when we have the above case it will create the first index, 
but the second(the same name and different expression) will be not created but 
the command is successful.
The second issue(these tests which are failing on PR) is that Geode expects 
that if run the same command again it will fail as there is already created 
index with the same name. With ignoring exceptions it will pass but shouldn't.

So, with a distributed lock we can avoid the issue you mentioned but still has 
other issues.

BR,
Mario



Šalje: Anilkumar Gingade 
Poslano: 3. veljače 2022. 16:46
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

The other problem which exists is; the case where two threads tries to create 
index with the same name with different index expression concurrently. I assume 
there are ways this could happen.
One solution to address overall issue with index creation on partitioned region 
is by taking a distributed lock with the index name.  When index creation 
request comes, it first acquires a distributed lock with the index name; any 
additional index creation with that name will be blocked till the previous 
index is created with the same name; during this time if the index creation 
comes through local or remote the exception can be ignored. As there is only 
one index creation will be in progress for the same request.

-Anil.

On 2/3/22, 4:41 AM, "Mario Kevo"  wrote:

Hi devs,

After implementing ignoring exception some tests failed as we allowed now 
to pass command again (although it does nothing as the same index is already 
created by execution before). 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F7195data=04%7C01%7Cagingade%40vmware.com%7C5c8bc7454b9044a05b1308d9e71275dd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637794888745101984%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=wUnq5WuzXrRnpeq%2FM8Ah1vF3TL8tETKxd%2B35v%2FXUMLg%3Dreserved=0


There is a summary of how it works by now.

When we are creating an index on a partitioned region, the locator sends to 
all members to create an index on all data it contains. The partitioned region 
is specific as it is normal that you want to index all data which are 
distributed on all members. That leads to every member will try to create it 
locally and send index create requests to all members on that site.
All members will check if there is an already created index or index 
creating is in progress and wait for it. In case a remotely originated request 
comes but there is already created index it will respond with Index and send an 
acknowledgment to the request sender side. In case it is not created already it 
will create an index on that member and then respond to the request sender 
side. This behavior is okay if we are using a small number of the server or 
using the --member option while creating indexes(which has no sense to use on 
the partitioned region as already described down in the mail thread).

The problem is when we are using a larger num of the servers(8 or more) or 
just with debugging on. It will slow down the whole process and then can happen 
that on some of the servers remotely originated create index request comes 
before locally request. In that case, a remotely originated request will see 
that there is no index with that name and will create a new one. But the 
problem happens after that when a local request comes and there is already 
created index it will think that it is from some execution before and throw 
IndexNameConflictException. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2Fdevelop%2Fgeode-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2FPartitionedRegion.java%23L8377data=04%7C01%7Cagingade%40vmware.com%7C5c8bc7454b9044a05b1308d9e71275dd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637794888745101984%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=0lbpsVQ63FjPaRBvhqmhmAsYp8V0gH3BokmbASBC9hg%3Dreserved=0

The create index command will fail(despite of that the index is created on 
all data, some with local requests ad some with remotely originated requests).
There are two problems with this implementation:

  1.  The user doesn't know that the index is created and will try to 
create it again but then it will fail on all servers.
  2.  The cluster config is updated after the command is finished 
successfully, which is correct as we cannot update the cluster config before 
anything is done.
The user can use indexes despite that comm

Odg: Creating index failed

2022-02-03 Thread Mario Kevo
Hi devs,

After implementing ignoring exception some tests failed as we allowed now to 
pass command again (although it does nothing as the same index is already 
created by execution before). https://github.com/apache/geode/pull/7195


There is a summary of how it works by now.

When we are creating an index on a partitioned region, the locator sends to all 
members to create an index on all data it contains. The partitioned region is 
specific as it is normal that you want to index all data which are distributed 
on all members. That leads to every member will try to create it locally and 
send index create requests to all members on that site.
All members will check if there is an already created index or index creating 
is in progress and wait for it. In case a remotely originated request comes but 
there is already created index it will respond with Index and send an 
acknowledgment to the request sender side. In case it is not created already it 
will create an index on that member and then respond to the request sender 
side. This behavior is okay if we are using a small number of the server or 
using the --member option while creating indexes(which has no sense to use on 
the partitioned region as already described down in the mail thread).

The problem is when we are using a larger num of the servers(8 or more) or just 
with debugging on. It will slow down the whole process and then can happen that 
on some of the servers remotely originated create index request comes before 
locally request. In that case, a remotely originated request will see that 
there is no index with that name and will create a new one. But the problem 
happens after that when a local request comes and there is already created 
index it will think that it is from some execution before and throw 
IndexNameConflictException. 
https://github.com/apache/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegion.java#L8377

The create index command will fail(despite of that the index is created on all 
data, some with local requests ad some with remotely originated requests).
There are two problems with this implementation:

  1.  The user doesn't know that the index is created and will try to create it 
again but then it will fail on all servers.
  2.  The cluster config is updated after the command is finished successfully, 
which is correct as we cannot update the cluster config before anything is done.
The user can use indexes despite that command failed, but the problem is that 
after the restart it has nothing in the cluster config and will not create an 
index on them.

So the question is what to do in this case? How to avoid this issue?
Ignore exceptions and fix failing tests expect that a new create index command 
will pass or disable --member option if partition region is used(or just 
document it) and don't send a request on other members as the command will send 
to all members to create it. Or maybe something else?

BR,
Mario



Šalje: Mario Kevo 
Poslano: 14. prosinca 2021. 14:06
Prima: dev@geode.apache.org 
Predmet: Odg: Creating index failed

Hi Alexandar,

The cluster config is updated at the end of the command execution, and only in 
case, the command is successful.
I created PR with Anlikumar's suggestion, but some tests failed. 
https://github.com/apache/geode/pull/7195
I tried with ignoring exception if it is already created, but in that case, if 
run again the create index command with the same name and expression it will 
not fail.

BR,
Mario



Šalje: Alexander Murmann 
Poslano: 7. prosinca 2021. 18:28
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario!
I agree with you that the user wanted to index all the data in the region when 
using a partitioned region. But when the command is not successful, the cluster 
config is not updated.
After the server restart, it will not have indexes as it is not stored in the 
cluster configuration.
Interesting! If I understand you correctly, the initial request to each server 
succeeds, but later ones will fail because the index is already there. However, 
the first, successful request should also have updated the cluster config, 
right?. Am I misunderstanding something?

From: Mario Kevo 
Sent: Tuesday, December 7, 2021 06:36
To: dev@geode.apache.org 
Subject: Odg: Creating index failed

Hi Jason,

I agree with you that the user wanted to index all the data in the region when 
using a partitioned region. But when the command is not successful, the cluster 
config is not updated.
After the server restart, it will not have indexes as it is not stored in the 
cluster configuration.
So there should be some changes, as the index is created on all members but the 
command is not successful.
I'm working on a fix. As soon as possible I will create PR on the already 
mentioned ticket.

BR,
Mario

Odg: Creating index failed

2021-12-14 Thread Mario Kevo
Hi Alexandar,

The cluster config is updated at the end of the command execution, and only in 
case, the command is successful.
I created PR with Anlikumar's suggestion, but some tests failed. 
https://github.com/apache/geode/pull/7195
I tried with ignoring exception if it is already created, but in that case, if 
run again the create index command with the same name and expression it will 
not fail.

BR,
Mario



Šalje: Alexander Murmann 
Poslano: 7. prosinca 2021. 18:28
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario!
I agree with you that the user wanted to index all the data in the region when 
using a partitioned region. But when the command is not successful, the cluster 
config is not updated.
After the server restart, it will not have indexes as it is not stored in the 
cluster configuration.
Interesting! If I understand you correctly, the initial request to each server 
succeeds, but later ones will fail because the index is already there. However, 
the first, successful request should also have updated the cluster config, 
right?. Am I misunderstanding something?

From: Mario Kevo 
Sent: Tuesday, December 7, 2021 06:36
To: dev@geode.apache.org 
Subject: Odg: Creating index failed

Hi Jason,

I agree with you that the user wanted to index all the data in the region when 
using a partitioned region. But when the command is not successful, the cluster 
config is not updated.
After the server restart, it will not have indexes as it is not stored in the 
cluster configuration.
So there should be some changes, as the index is created on all members but the 
command is not successful.
I'm working on a fix. As soon as possible I will create PR on the already 
mentioned ticket.

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2021. 18:45
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario,

A lot of the indexing code pre-dates GFSH. The behavior you are seeing is when 
an index is created on a partition region.  When creating an index on a 
partition region, the idea is that the user wanted to index all the data in the 
region.  So the server will let all other servers know to create an index on 
the partition region.

This is slightly different for an index on a replicated region.  That is when 
the index can be created on a per member basis, which is what I think the 
-member flag is for.

GFSH however defaults to sending the create index message to all members for 
any index type from what I remember and from what is being described. That is 
why you’ll see the race condition with indexes created on partitioned regions 
but the end result being that the index that someone wanted to create is either 
created or already there.

-Jason

On 12/6/21, 6:37 AM, "Mario Kevo"  wrote:

Hi devs,

While doing some testing, I found the issue which is already reported 
there. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-7875data=04%7C01%7Camurmann%40vmware.com%7C48b13b5a3485492868dd08d9b98efd67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637744846071934875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=82bEcKRqw8yIP4MCx7hsVKgNprgdWu9Vh%2FNatImH2Vo%3Dreserved=0

If we run the create index command it will create an index locally and send 
a request to create an index on other members of that region.
The problem happened if the remote request comes before the request from 
the locator, in that case, a request from the locator failed with the following 
message: Index "index1" already exists.  Create failed due to duplicate name.

This can be reproduced by running 6 servers with DEBUG log level(due to 
this system will be slower), creating a partitioned region, and then creating 
an index.

Why does the server send remote requests to other members as they will get 
a request from the locator to create an index?
Also when running the gfsh command to create an index on one member, it 
will send create index requests to all other members. In that case, what is the 
purpose of this --member flag?

BR,
Mario




Odg: Creating index failed

2021-12-07 Thread Mario Kevo
Are you thinking about not sending it to the remote nodes or not sending 
requests from locator to the each node?

Also, there is one map where indexTask is stored, and there is putIfAbsent 
method which seems is not working properly.
// This will return either the Index FutureTask or Index itself, based
// on whether the index creation is in process or completed.
Object ind = this.indexes.putIfAbsent(indexTask, indexFutureTask);
​In case we change it to something like:
Object ind = null;
if(!this.indexes.containsKey(indexTask)) {
  ind = this.indexes.put(indexTask, indexFutureTask);
}
​If it already has that indexTask it will not go again to run creating it, 
whether or not that index is created by remote request or locally. And in that 
case, the command will be successful and the cluster config is updated.

BR,
Mario


Šalje: Anilkumar Gingade 
Poslano: 7. prosinca 2021. 16:41
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

In case if you are planning to fix; the probable fix is not to send gfsh create 
command to all the nodes when its partitioned region..

On 12/7/21, 6:37 AM, "Mario Kevo"  wrote:

Hi Jason,

I agree with you that the user wanted to index all the data in the region 
when using a partitioned region. But when the command is not successful, the 
cluster config is not updated.
After the server restart, it will not have indexes as it is not stored in 
the cluster configuration.
So there should be some changes, as the index is created on all members but 
the command is not successful.
I'm working on a fix. As soon as possible I will create PR on the already 
mentioned ticket.

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2021. 18:45
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario,

A lot of the indexing code pre-dates GFSH. The behavior you are seeing is 
when an index is created on a partition region.  When creating an index on a 
partition region, the idea is that the user wanted to index all the data in the 
region.  So the server will let all other servers know to create an index on 
the partition region.

This is slightly different for an index on a replicated region.  That is 
when the index can be created on a per member basis, which is what I think the 
-member flag is for.

GFSH however defaults to sending the create index message to all members 
for any index type from what I remember and from what is being described. That 
is why you’ll see the race condition with indexes created on partitioned 
regions but the end result being that the index that someone wanted to create 
is either created or already there.

-Jason

On 12/6/21, 6:37 AM, "Mario Kevo"  wrote:

Hi devs,

While doing some testing, I found the issue which is already reported 
there. https://issues.apache.org/jira/browse/GEODE-7875

If we run the create index command it will create an index locally and 
send a request to create an index on other members of that region.
The problem happened if the remote request comes before the request 
from the locator, in that case, a request from the locator failed with the 
following message: Index "index1" already exists.  Create failed due to 
duplicate name.

This can be reproduced by running 6 servers with DEBUG log level(due to 
this system will be slower), creating a partitioned region, and then creating 
an index.

Why does the server send remote requests to other members as they will 
get a request from the locator to create an index?
Also when running the gfsh command to create an index on one member, it 
will send create index requests to all other members. In that case, what is the 
purpose of this --member flag?

BR,
Mario





Odg: Creating index failed

2021-12-07 Thread Mario Kevo
Hi Jason,

I agree with you that the user wanted to index all the data in the region when 
using a partitioned region. But when the command is not successful, the cluster 
config is not updated.
After the server restart, it will not have indexes as it is not stored in the 
cluster configuration.
So there should be some changes, as the index is created on all members but the 
command is not successful.
I'm working on a fix. As soon as possible I will create PR on the already 
mentioned ticket.

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2021. 18:45
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario,

A lot of the indexing code pre-dates GFSH. The behavior you are seeing is when 
an index is created on a partition region.  When creating an index on a 
partition region, the idea is that the user wanted to index all the data in the 
region.  So the server will let all other servers know to create an index on 
the partition region.

This is slightly different for an index on a replicated region.  That is when 
the index can be created on a per member basis, which is what I think the 
-member flag is for.

GFSH however defaults to sending the create index message to all members for 
any index type from what I remember and from what is being described. That is 
why you’ll see the race condition with indexes created on partitioned regions 
but the end result being that the index that someone wanted to create is either 
created or already there.

-Jason

On 12/6/21, 6:37 AM, "Mario Kevo"  wrote:

Hi devs,

While doing some testing, I found the issue which is already reported 
there. https://issues.apache.org/jira/browse/GEODE-7875

If we run the create index command it will create an index locally and send 
a request to create an index on other members of that region.
The problem happened if the remote request comes before the request from 
the locator, in that case, a request from the locator failed with the following 
message: Index "index1" already exists.  Create failed due to duplicate name.

This can be reproduced by running 6 servers with DEBUG log level(due to 
this system will be slower), creating a partitioned region, and then creating 
an index.

Why does the server send remote requests to other members as they will get 
a request from the locator to create an index?
Also when running the gfsh command to create an index on one member, it 
will send create index requests to all other members. In that case, what is the 
purpose of this --member flag?

BR,
Mario




Creating index failed

2021-12-06 Thread Mario Kevo
Hi devs,

While doing some testing, I found the issue which is already reported there. 
https://issues.apache.org/jira/browse/GEODE-7875

If we run the create index command it will create an index locally and send a 
request to create an index on other members of that region.
The problem happened if the remote request comes before the request from the 
locator, in that case, a request from the locator failed with the following 
message: Index "index1" already exists.  Create failed due to duplicate name.

This can be reproduced by running 6 servers with DEBUG log level(due to this 
system will be slower), creating a partitioned region, and then creating an 
index.

Why does the server send remote requests to other members as they will get a 
request from the locator to create an index?
Also when running the gfsh command to create an index on one member, it will 
send create index requests to all other members. In that case, what is the 
purpose of this --member flag?

BR,
Mario



Odg: Region is not created on one of the servers

2021-10-12 Thread Mario Kevo
The new ticket is opened.
https://issues.apache.org/jira/browse/GEODE-9718

There are two proposals on the ticket, so it should be decided in which way we 
should go.

BR,
Mario

Šalje: Udo Kohlmeyer 
Poslano: 12. listopada 2021. 0:59
Prima: dev@geode.apache.org 
Predmet: Re: Region is not created on one of the servers

Hi Mario,

I think your assessment of the problem is correct. Thinking about it, there is 
no simple (correct) way to easily solve this. Given that there are too many 
variables in play, users making configurational changes, whilst servers are 
coming up.

Now, that said, I think we should address this problem. I also think your 
assessment is correct that cluster configuration was not written to handle this 
scenario. I think some thought has to go into the algorithm that one would like 
to follow and how we would like to resolve it.

Can you please raise a ticket on this issue.

--Udo

From: Mario Kevo 
Date: Monday, October 11, 2021 at 11:27 PM
To: dev@geode.apache.org 
Subject: Odg: Region is not created on one of the servers
I think that there can be a problem if we change to first add it to cluster 
config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the 
view and sends all of them to start creating a region with parameters specified 
in the command.
The region creating is started on all servers and after it is finished, it is 
added to the cluster configuration. In case there are some problems with 
creating a region(wrong parameter used or something else) it will not create a 
region on the existing servers and will not write anything in a cluster 
configuration.

In case we decide to change order, it will write in the cluster config before 
the command is successful, and then we should have some backup to rollback 
cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with 
starting servers.

BR,
Mario

Šalje: Dan Smith 
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org 
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region 
didn't get created on 1 server. At first glance, it looks like the problem is 
that we distribute the region to all the servers before adding it to cluster 
config? Seems like we need to do distribution after​ adding the region to 
cluster config, to make sure that all servers get the region.

-Dan
____
From: Mario Kevo 
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org 
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information 
to create a region on itself, but the problem is in the server which is started 
after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after 
that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get 
information to create a region on itself from the locator. Also, the cluster 
configuration doesn't have that information yet, so the server cannot read it 
from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario


Odg: Region is not created on one of the servers

2021-10-11 Thread Mario Kevo
I think that there can be a problem if we change to first add it to cluster 
config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the 
view and sends all of them to start creating a region with parameters specified 
in the command.
The region creating is started on all servers and after it is finished, it is 
added to the cluster configuration. In case there are some problems with 
creating a region(wrong parameter used or something else) it will not create a 
region on the existing servers and will not write anything in a cluster 
configuration.

In case we decide to change order, it will write in the cluster config before 
the command is successful, and then we should have some backup to rollback 
cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with 
starting servers.

BR,
Mario

Šalje: Dan Smith 
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org 
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region 
didn't get created on 1 server. At first glance, it looks like the problem is 
that we distribute the region to all the servers before adding it to cluster 
config? Seems like we need to do distribution after​ adding the region to 
cluster config, to make sure that all servers get the region.

-Dan
____
From: Mario Kevo 
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org 
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information 
to create a region on itself, but the problem is in the server which is started 
after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after 
that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get 
information to create a region on itself from the locator. Also, the cluster 
configuration doesn't have that information yet, so the server cannot read it 
from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario



Region is not created on one of the servers

2021-10-08 Thread Mario Kevo
Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information 
to create a region on itself, but the problem is in the server which is started 
after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after 
that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get 
information to create a region on itself from the locator. Also, the cluster 
configuration doesn't have that information yet, so the server cannot read it 
from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario



Odg: [DISCUSS] Upgrading to Lucene 7.1.0

2021-10-08 Thread Mario Kevo
Hi,

When the servers are in a mixed version state in our tests, we blocked queries 
until the servers are on the versions with the same Lucene version.
If we allow queries while the servers are in a mixed state we got errors like 
IndexFormatTooNewException if the query is executed on the server with an older 
version of Lucene(It cannot read the new Lucene index format).
In the previous mail thread, I got the following:
"The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded. This requires test changes to not try to validate 
using queries (since we prevent draining and repo creation, the query will just 
wait)"

So we changed tests to not do queries until all members are on the same version.

BR,
Mario



Šalje: Dan Smith 
Poslano: 28. rujna 2021. 19:41
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

My understanding from our previous discussion about upgrading lucene was that 
we talked about pausing the asynchronous indexing process during the rolling 
upgrade. I don't remember a discussion that it was ok to not allow queries 
during the upgrade. But this is what we added to the docs:

"All cluster members must be running the same major Lucene version in order to 
execute Lucene queries."

What happens if a user runs a query during the rolling upgrade and why do we 
need to have this restriction? It seems to me like at a minimum we need to 
allow queries during the upgrade.

We also should consider what will happen to users with server-side query or 
indexing code - will they be able to upgrade or are they likely to hit breaking 
changes in the Lucene API?

-Dan

From: Nabarun Nag 
Sent: Tuesday, September 28, 2021 7:13 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0

But Mario, just for my clarification, if we re-enable the queries in the tests 
in the mixed version servers mode, it goes into a stackoverflow situation. That 
what i saw when i set hasLuceneVersionMismatch(host) to false in the test so 
that it does the query.

Regards
Naba

________
From: Mario Kevo 
Sent: Tuesday, September 28, 2021 4:49 AM
To: dev@geode.apache.org 
Subject: Odg: [DISCUSS] Upgrading to Lucene 7.1.0

Hi all,

Just a small clarification of the reverted PR.

There were a lot of changes between Lucene versions 6.x and 7.x. There is an 
article for that 
Upgrading+to+Lucene+7.1.0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FUpgrading%2Bto%2BLucene%2B7.1.0data=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682888690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=QMiJUG1HtmOKNCXzn6KrBVM4gYdLoJeV8FjDFMkUL8I%3Dreserved=0>.

The first larger change was in the scoring mechanism. We adapt it to one that 
is correct for us. (verified by DistributedScoringJUnitTest)

The main change was in Lucene index format. There we come into a problem with 
our tests.
Lucene 6.x cannot read the index format of Lucene 7.x.
Through PRs we decided to include Lucene uplift in Geode 1.15.0 and add check 
if all members are on 1.15.0 version or higher (after uplift Lucene to a newer 
version with index format changes this should be changed). If a check is passed 
it will allow doing Lucene query, if not there will be a printed log that not 
all members on 1.15.0 or higher version.

Also, you can found a discussion on dev list from 2 years ago about Lucene 
upgrade: Lucene 
Upgrade<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fmessage%2Fqwooctuz7ekaezor%3Fq%3Dlist%3Aorg.apache.geode.dev%2Border%3Adate-backward%2BLucene%2Bupgrade%26page%3D4%23query%3Alist%253Aorg.apache.geode.dev%2520order%253Adate-backward%2520Lucene%2520upgrade%2Bpage%3A4%2Bmid%3Aygjhsuikdrbuihap%2Bstate%3Aresultsdata=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682898695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZyKGoWh6nhpWPTiNIJHzLicfjCmW0yoq1fZa9aLngbQ%3Dreserved=0>

BR,
Mario




Šalje: Udo Kohlmeyer 
Poslano: 28. rujna 2021. 1:44
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

Might I propose something here.

There is currently a significant amount of work going into completing 
Geode-8705, which is the Classloader isolation. We are currently targeting to 
getting this release in Geode 1.16.

My proposal is, that we use the capability that Patrick demo’d at the Community 
meeting (on this topic) where one, at runtime, can unload /  load extensions 
(like our integration with Lucene). This means that one coul

Odg: [DISCUSS] Upgrading to Lucene 7.1.0

2021-09-28 Thread Mario Kevo
Hi all,

Just a small clarification of the reverted PR.

There were a lot of changes between Lucene versions 6.x and 7.x. There is an 
article for that 
Upgrading+to+Lucene+7.1.0.

The first larger change was in the scoring mechanism. We adapt it to one that 
is correct for us. (verified by DistributedScoringJUnitTest)

The main change was in Lucene index format. There we come into a problem with 
our tests.
Lucene 6.x cannot read the index format of Lucene 7.x.
Through PRs we decided to include Lucene uplift in Geode 1.15.0 and add check 
if all members are on 1.15.0 version or higher (after uplift Lucene to a newer 
version with index format changes this should be changed). If a check is passed 
it will allow doing Lucene query, if not there will be a printed log that not 
all members on 1.15.0 or higher version.

Also, you can found a discussion on dev list from 2 years ago about Lucene 
upgrade: Lucene 
Upgrade

BR,
Mario




Šalje: Udo Kohlmeyer 
Poslano: 28. rujna 2021. 1:44
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

Might I propose something here.

There is currently a significant amount of work going into completing 
Geode-8705, which is the Classloader isolation. We are currently targeting to 
getting this release in Geode 1.16.

My proposal is, that we use the capability that Patrick demo’d at the Community 
meeting (on this topic) where one, at runtime, can unload /  load extensions 
(like our integration with Lucene). This means that one could possibly do a 
rolling upgrade on the existing system, and keep the versions of the Lucene 
integration stable.

Once the whole system has been upgraded, the existing Lucene extension 
component is then unloaded, and the newer version of the extension component is 
then loaded. What this means, is that at runtime, there will be a period of 
time where Lucene queries will not be available and as part of the “load” 
lifecycle of the extension, there needs to be an initialization step, which 
will initialize the extension component safely.

Once initialized, Lucene queries can then become available again, etc.

This if course requires some work around the lifecycles of extension components 
and making sure that I can add the extension on at runtime and safely 
initialize it.

I think this approach allows for a more seamless (lower downtime) upgrading of 
system and extension components.

Thoughts?

--Udo

From: Nabarun Nag 
Date: Tuesday, September 28, 2021 at 7:33 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0
The solution for preventing the query executions to occur in the mixed version 
mode also caused some problems where the query function executions get 
repeatedly executed and that results in stack overflow.



From: Nabarun Nag 
Sent: Monday, September 27, 2021 2:30 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0

In simple words,  if Lucene indexes were created by a new version (7.1.0), then 
replicated to others that are still in the older version, they won't understand 
the index, and the event processors start throwing exceptions.

This can be simply seen by just re-enabling the query execution in the DUnit 
tests and commenting out the check blocks: [develop SHA: 
68629356f561a932f5dfbace70b01d9971a42473]

In LuceneEventListener
if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) {
  logger.info("Some members are older than " + 
KnownVersion.GEODE_1_15_0.getName());
  return false;
}

In IndexRepositoryFactory:
if (userRegion.getCache() != null
&& userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) 
{
  logger.info("Some members are older than " + 
KnownVersion.GEODE_1_15_0.getName());
  return null;
}


This is the exception that will be encountered:

[Exception]

[vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT  tid=102] An Exception occurred. 
The dispatcher will continue.
[vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index 
repository
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118)
[vm2_v1.2.0] at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108)
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137)

Odg: [VOTE] Apache Geode 1.14.0.RC2

2021-09-01 Thread Mario Kevo
+1


  *   build from the source
  *   run gfsh
  *   run geode-examples


Šalje: Donal Evans 
Poslano: 1. rujna 2021. 2:49
Prima: dev@geode.apache.org 
Predmet: Re: [VOTE] Apache Geode 1.14.0.RC2

+1

Validated that performance across a range of workloads is equivalent to 
previous releases

From: nabarun nag 
Sent: Tuesday, August 31, 2021 5:36 PM
To: dev@geode.apache.org 
Subject: [VOTE] Apache Geode 1.14.0.RC2

Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.14.0.RC2.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you
performed.

Voting deadline: Please note that this is an expedited voting process
3PM PST Thur, September 02 2021.

Please note that we are voting upon the source tag:
rel/v1.14.0.RC2

Release notes:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.14.0data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=cJWZ%2BgtEEKontEsDKm0y3YOvnVZ6cFeF5WzjVZF41f8%3Dreserved=0

Source and binary distributions:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.14.0.RC2%2Fdata=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=seJKNwaisxT0g9tMJpsg2O9fZEpB3JONZTMlFTIU6D0%3Dreserved=0

Maven staging repo:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1098data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=kZ6HTEZj0OdEM0ZgGm4TMcfwwvHAlvzfMWabn1549Cw%3Dreserved=0

GitHub:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.14.0.RC2data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=3dZJbHhEYbKalCW7R9Zaqnm6Y4TRgIDCnW6%2BX1Gynrk%3Dreserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.14.0.RC2data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=F0H7s%2Fid%2FM613DshfSFF88DJ8VF7rA1iTmX%2BM6OCZvM%3Dreserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.14.0.RC2data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=syiMhfgtMnu0RRcOYJOHJV24UKfX1TO%2FEaIFxkqlL2Q%3Dreserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-benchmarks%2Ftree%2Frel%2Fv1.14.0.RC2data=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Bt2rS4kMZPtLXST1VHUEBBUpBeRAJN%2FlzJTw%2Bxy6p6Q%3Dreserved=0

Pipelines:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-14-maindata=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=7MbP67Duzw7syMknmQyssHu8n42Q2aBblsUy1NlFc28%3Dreserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-14-rcdata=04%7C01%7Cdoevans%40vmware.com%7Cf1388b1204f94c02180a08d96ce0a654%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637660534388597789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CmXU8PzOkjgd66rzP6GKj52ZN0towKzMk9geG8A31AQ%3Dreserved=0

Geode's KEYS file containing PGP keys we use to sign the release:

Odg: Add new attribute to count members visible to member

2021-08-06 Thread Mario Kevo
Hi,

Just a reminder of this question.

BR,
Mario

Šalje: Mario Kevo
Poslano: 23. lipnja 2021. 13:15
Prima: dev@geode.apache.org 
Predmet: Add new attribute to count members visible to member

Hi, devs,

I'm trying to fix attribute visibleNodes in MemberMXBean as for me it seems 
that we don't have a correct value for it. (More description in the ticket 
GEODE-9101<https://issues.apache.org/jira/browse/GEODE-9101>)
The ticket is opened and PR#6225<https://github.com/apache/geode/pull/6225> is 
created.

The earliest version of GemFire had locators which were not members (ie not 
nodes), but locators now are definitely members (ie nodes). The next type of 
locators in GemFire were members (ie nodes) but without caches. As of GemFire 
7.0, all locators also have a cache. So in every version of Geode, all locators 
are members with caches. The methods to create that earliest type of locator 
still exists but is deprecated... so deprecated internal methods are now the 
only way to create the thin locators that are not peer-to-peer members of the 
cluster.

Not sure if we should change the existing attribute as it is done in the PR, or 
we need to keep this attribute and add a new attribute something like 
visibleMembers which will count all members in the system visible to one of the 
members?

Thanks and BR,
Mario




Odg: NullPointerException while create region during server restart

2021-07-08 Thread Mario Kevo
Hi Anthony,

It happened while the server is starting and creating a cache (while fills in 
the content of a cache based on the creation object's state). The NPE occurs 
when the "create region" command is executed before pdxRegistry is initialized. 
There is that part of the code where pdxRegistry is initialized: 
https://github.com/Nordix/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheCreation.java#L529

Before this part of the code is executed it has that pdxRegistry is null, and 
it throws the NPE in findDiskStore.


BR,
Mario

Šalje: Anthony Baker 
Poslano: 7. srpnja 2021. 17:58
Prima: dev@geode.apache.org 
Predmet: Re: NullPointerException while create region during server restart

When the NPE occurs, has the server completed its bootstrapping from cluster 
configuration yet?

Anthony


> On Jul 6, 2021, at 12:06 AM, Mario Kevo  wrote:
>
> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 
> regarding NullPointerException on creating region while one of the servers is 
> restarting.
> If we run the "create region" command through gfsh while the server is 
> starting it passed, but if the server is restarted then it fails. The 
> difference is that when we restarted the server, we kill them and start 
> again. As it has already a server directory, it takes more time to get the 
> server up as expected.
> In that case, if we run the "create region" command it can happen that the 
> cache is not fully created and we are trying to do something on that. That 
> can lead to the NullPointerException, as creating region catches pdxRegistry 
> from the cache while doing findDiskStore, but sometimes it is not initialized 
> in the cache yet. So every method run against that will throw 
> NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>InternalRegionArguments internalRegionArgs) {
>  // validate that persistent type registry is persistent
>  if (getAttributes().getDataPolicy().withPersistence()) {
>getCache().getPdxRegistry().creatingPersistentRegion();
>  }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is 
> not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. 
> It can be passed if we add some retry and sleep, but that is not acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is 
> initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario



NullPointerException while create region during server restart

2021-07-06 Thread Mario Kevo
Hi Geode devs,

I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 
regarding NullPointerException on creating region while one of the servers is 
restarting.
If we run the "create region" command through gfsh while the server is starting 
it passed, but if the server is restarted then it fails. The difference is that 
when we restarted the server, we kill them and start again. As it has already a 
server directory, it takes more time to get the server up as expected.
In that case, if we run the "create region" command it can happen that the 
cache is not fully created and we are trying to do something on that. That can 
lead to the NullPointerException, as creating region catches pdxRegistry from 
the cache while doing findDiskStore, but sometimes it is not initialized in the 
cache yet. So every method run against that will throw NullPoniterException.
There is a part of the code where the exception is thrown:

DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
InternalRegionArguments internalRegionArgs) {
  // validate that persistent type registry is persistent
  if (getAttributes().getDataPolicy().withPersistence()) {
getCache().getPdxRegistry().creatingPersistentRegion();
  }

As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is 
not yet initialized in create(CacheCreation.java):

DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);

cache.initializePdxRegistry();

createDiskStores(cache, pdxRegDSC);

I tried to do some fixes, but without a success. 
It can be passed if we add some retry and sleep, but that is not acceptable.

So if someone has some idea how to do some wait until pdxRegistry is 
initialized or something else what will help us to avoid this problem?

BR,
Mario


Add new attribute to count members visible to member

2021-06-23 Thread Mario Kevo
Hi, devs,

I'm trying to fix attribute visibleNodes in MemberMXBean as for me it seems 
that we don't have a correct value for it. (More description in the ticket 
GEODE-9101)
The ticket is opened and PR#6225 is 
created.

The earliest version of GemFire had locators which were not members (ie not 
nodes), but locators now are definitely members (ie nodes). The next type of 
locators in GemFire were members (ie nodes) but without caches. As of GemFire 
7.0, all locators also have a cache. So in every version of Geode, all locators 
are members with caches. The methods to create that earliest type of locator 
still exists but is deprecated... so deprecated internal methods are now the 
only way to create the thin locators that are not peer-to-peer members of the 
cluster.

Not sure if we should change the existing attribute as it is done in the PR, or 
we need to keep this attribute and add a new attribute something like 
visibleMembers which will count all members in the system visible to one of the 
members?

Thanks and BR,
Mario




Odg: Concourse access

2021-02-26 Thread Mario Kevo
Hi Jacob,

I'm also getting error while trying to access pipline:


Please, can you give me permissions?
My username: mkevo

BR,
Mario

Šalje: Mario Salazar de Torres 
Poslano: 25. veljače 2021. 22:29
Prima: dev@geode.apache.org 
Predmet: Re: Concourse access

Hi Jacob,

In my case, I can see now both geode-native-develop and geode-native-develop-pr 
set of pipelines, but when I try to see any execution of any of its pipelines I 
am stuck at the loading message. Interestingly I've noticed that one of the 
underlying requests sent 
(https://concourse.apachegeode-ci.info/api/v1/builds/7105/plan) returns 403, so 
maybe I am missing still some permissions?

BR,
Mario.

From: Jacob Barrett 
Sent: Thursday, February 25, 2021 9:55 PM
To: dev@geode.apache.org 
Subject: Re: Concourse access

I made the pipelines public. If you want more access than that let us know. 
Before that though can you give me a sense for the usability of the public 
access on the pipelines?

Thanks,
Jake


> On Feb 25, 2021, at 8:31 AM, Mario Salazar de Torres 
>  wrote:
>
> Hi everyone,
> I am trying to access one of the jobs for the geode-native concourse pipeline 
> (I.E: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fbuilds%2F7105data=04%7C01%7Cjabarrett%40vmware.com%7Ce185d8a90bbc465b627d08d8d9aac74f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637498674804124104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=i3BkZhXIcGAPBlyv0%2BoSCYhCEUzr%2BpLOM4nZj416kYY%3Dreserved=0),
>  but the only thing I can see is a loading message. Do I need any read 
> permissions?
> If that's the case, this is my GitHub username is 'gaussianrecurrence'
>
> BR,
> Mario



Odg: Colocated regions missing some buckets after restart

2020-09-28 Thread Mario Kevo
Hi Donal,

Sometimes you need to do restart two or three times, but mostly it is 
reproduced by first restart.
start locator --name=locator1 --port=10334
start locator --name=locator2 --port=10335 --locators=localhost[10334]
start server --name=server1 --locators=127.0.0.1[10334],127.0.0.1[10335] 
--server-port=40404
start server --name=server2 --locators=127.0.0.1[10334],127.0.0.1[10335] 
--server-port=40405
I'm putting 1 entries, but you can use a lower value.

You need to be really quick with commands. There is an example from my locator 
log.
[info 2020/09/29 07:41:52.060 CEST  
tid=0x1d] Received a join request from 192.168.0.145(server4:22852):41002
[info 2020/09/29 07:41:52.406 CEST  
tid=0x1d] Received a join request from 192.168.0.145(server3:22879):41003

I prepare commands to start server in two terminals, so I can start them almost 
in the same time.
Sorry, I forgot to mention that you need to see which server is stopped first 
and starts him first (The issue was first reproduced on kubernetes, and that is 
how pods restarts servers).
Also if you are not able to reproduce the issue, try to set 10 or more 
colocated regions.

BR,
Mario


Šalje: Donal Evans 
Poslano: 28. rujna 2020. 23:48
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Hi Mario,

I tried to reproduce the issue using the steps you describe, but I wasn't able 
to. After restarting the servers, all regions have the expected 113 buckets, 
and the server startup process is not noticeably slower. I have a few questions 
that might help understand why I'm unable to reproduce this:

  *   Do you see this behaviour 100% of the time with these steps, or is still 
only on some restarts that it shows up?
  *   Could you describe in more detail how exactly you're starting the 
locators/servers? I'm just using the gfsh "start locator" and "start server" 
commands, only specifying ports, with no other settings, so if you're doing 
anything different that may be a factor.
  *   How many entries are you putting into the region, and does the issue 
still reproduce if you use fewer entries? I'm using 1 entries as described 
in your earlier email.
  *   How quick do you have to be when restarting the servers in the two 
terminals at the same time? I'm currently just manually clicking between them 
and executing the two start server commands within a second of each other, but 
if that's not fast enough then I should probably be using a script or something.

Hopefully if we can understand what's different between what I'm doing and what 
you're doing then it will help us understand exactly what's going wrong.

- Donal
________
From: Mario Kevo 
Sent: Monday, September 28, 2020 6:23 AM
To: dev@geode.apache.org 
Subject: Odg: Colocated regions missing some buckets after restart

Hi all,

After more investigation I found that for some buckets is problem to define 
which server is primary.
While doing getPrimary if existing primary is null it waits for a new primary 
and after some time return null for it.

From what I found is while doing setHosting( 
grabBucket[PartitionedRegionDataStore.java]->grabFreeBucket[PartitionedRegionDataStore.java]->setHosting[ProxyBucketRegion.java]->setHosting[BucketAdvisor.java])
 it volunteer for primary and sendProfileUpdate to all other servers.
There it calls BucketProfileUpdateMessage.send and there is stucked as it 
cannot get response from the other members.

Ticket is opened on GEODE: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8546data=02%7C01%7Cdoevans%40vmware.com%7C4a51a06464f34b8cf6ed08d863b1c66f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637368962530124201sdata=cfW3BI0K906FutWL9QQBDlDharQdK08%2FRY1iUgyImWk%3Dreserved=0
How to reproduce the issue:

  1.   Start two locators and two servers
  2.   Create PARTITION_REDUNDANT_PERSISTENT region with redundant-copies=1
  3.   Create few PARTITION_REDUNDANT regions(I used six regions) colocated 
with persistent region and redundant-copies=1
  4.   Put some entries.
  5.   Restart servers(you can simply run "kill -15 " and then 
from two terminals start both of them at the same time)
  6.   It will take a time to get server startup finished and for the latest 
region bucketCount will be zero on one member

If someone with more experience with bucket initialization have a time to help 
me with this I will appriciate it.
For any more info, please contact me.

BR,
Mario


____
Šalje: Mario Kevo 
Poslano: 17. rujna 2020. 15:00
Prima: dev@geode.apache.org 
Predmet: Odg: Colocated regions missing some buckets after restart

Hi Anil,

Thread dump is in an attachment.
For now we found difference between server logs, on the one which have this 
problem has this log "Colocation is incomplete".
So it seems that colocati

Re: [DISCUSS] One more 1.13 change

2020-09-28 Thread Mario Kevo
+1

Šalje: Patrick Johnson 
Poslano: 28. rujna 2020. 21:27
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] One more 1.13 change

+1

> On Sep 28, 2020, at 12:21 PM, Dan Smith  wrote:
>
> Hi,
>
> I'd like to backport this change to support/1.13 as well
>
> GEODE-8522: Switching exception log back to debug - 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F5566data=02%7C01%7Cjpatrick%40vmware.com%7Cb32ecd430b404e610fe708d863e3b4a5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369176927680210sdata=lgNl51lgbcV1LmuihEAhkw8mwxbfAbUqawqsEgwdNUA%3Dreserved=0
>
> This cleans up some noise in our logs that customers might see.
> [https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Favatars3.githubusercontent.com%2Fu%2F47359%3Fs%3D400%26v%3D4data=02%7C01%7Cjpatrick%40vmware.com%7Cb32ecd430b404e610fe708d863e3b4a5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369176927680210sdata=KNHUq8JyZOd5KSto0kgg6vJJNcZq3HLTN67uj2xzBg4%3Dreserved=0]
> GEODE-8522: Switching exception log back to debug (merge to 1.13) by 
> upthewaterspout · Pull Request #5566 · 
> apache/geode
> This log message happens during the course of normal startup of multiple 
> locators. We should not be logging a full stack trace during normal startup. 
> (cherry picked from commit 3df057c) Thank you f...
> github.com
>



Odg: Colocated regions missing some buckets after restart

2020-09-28 Thread Mario Kevo
Hi all,

After more investigation I found that for some buckets is problem to define 
which server is primary.
While doing getPrimary if existing primary is null it waits for a new primary 
and after some time return null for it.

From what I found is while doing setHosting( 
grabBucket[PartitionedRegionDataStore.java]->grabFreeBucket[PartitionedRegionDataStore.java]->setHosting[ProxyBucketRegion.java]->setHosting[BucketAdvisor.java])
 it volunteer for primary and sendProfileUpdate to all other servers.
There it calls BucketProfileUpdateMessage.send and there is stucked as it 
cannot get response from the other members.

Ticket is opened on GEODE: https://issues.apache.org/jira/browse/GEODE-8546
How to reproduce the issue:

  1.   Start two locators and two servers
  2.   Create PARTITION_REDUNDANT_PERSISTENT region with redundant-copies=1
  3.   Create few PARTITION_REDUNDANT regions(I used six regions) colocated 
with persistent region and redundant-copies=1
  4.   Put some entries.
  5.   Restart servers(you can simply run "kill -15 " and then 
from two terminals start both of them at the same time)
  6.   It will take a time to get server startup finished and for the latest 
region bucketCount will be zero on one member

If someone with more experience with bucket initialization have a time to help 
me with this I will appriciate it.
For any more info, please contact me.

BR,
Mario


____
Šalje: Mario Kevo 
Poslano: 17. rujna 2020. 15:00
Prima: dev@geode.apache.org 
Predmet: Odg: Colocated regions missing some buckets after restart

Hi Anil,

Thread dump is in an attachment.
For now we found difference between server logs, on the one which have this 
problem has this log "Colocation is incomplete".
So it seems that colocation is not finished for this region on this member. 
This part of code can be found on this 
link<https://github.com/apache/geode/blob/f2ccbc8ae860fc018baba7cc8de7b5e01a22c606/geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionDataStore.java#L660>.
We will continue investigation on this and try to find what cause the issue.

BR,
Mario


Šalje: Anilkumar Gingade 
Poslano: 16. rujna 2020. 16:55
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Mario,

Take a thread dump; couple of times at an interval of a minute...See if you can 
find threads stuck in region creation...This will show if there are any lock 
contention.

-Anil.


On 9/16/20, 6:29 AM, "Mario Kevo"  wrote:

Hi Anil,

From server logs we see that have some threads stucked and continuosly get 
on server2 the following message(bucket missing on server2 for DfSessions 
region):
[warn 2020/09/15 14:25:39.852 CEST  
tid=0x251] 15 secs have elapsed waiting for a primary for bucket [BucketAdvisor 
/__PR/_B__DfSessions_18:935: state=VOLUNTEERING_HOSTING]. Current bucket owners 
[]


And on the other server1:
[warn 2020/09/15 14:25:40.852 CEST  
tid=0xdf] 15 seconds have elapsed while waiting for replies: 
:41003]> on 
192.168.0.145(server1:28031):41002 whose current membership list is: 
[[192.168.0.145(locator1:27244:locator):41000, 
192.168.0.145(locator2:27343:locator):41001, 
192.168.0.145(server1:28031):41002, 192.168.0.145(server2:28054):41003]]

[warn 2020/09/15 14:27:20.200 CEST  tid=0x11] Thread 223 
(0xdf) is stuck

[warn 2020/09/15 14:27:20.202 CEST  tid=0x11] Thread <223> 
(0xdf) that was executed at <15 Sep 2020 14:25:24 CEST> has been stuck for 
<115.361 seconds> and number of thread monitor iteration <1>
Thread Name  state 
...
It seems that this is not problem with stats.
We have a some suspicion that the problem is with some lock, but we need to 
investigate it a bit more.

BR,
Mario




Šalje: Anilkumar Gingade 
Poslano: 15. rujna 2020. 16:36
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Mario,

I doubt this has anything to do with the client connections. If it is it 
should be between server/member to server/member connection; in that case the 
unresponsive member is kicked out from the cluster.

The recommended configuration is to have persistence regions for both 
parent and co-located regions (and replicated regions)...

There could be issues in the stats too...Can you try executing a 
test/validation code on server side to dump/list primary and secondary buckets.
You can do that using helper methods: 
pr.getDataStore().getAllLocalPrimaryBucketIds();

-Anil

On 9/14/20, 12:25 AM, "Mario Kevo"  wrote:

Hi,


This problem is usually seen only on 1 server. The other servers 
metrics and bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic ser

Odg: Colocated regions missing some buckets after restart

2020-09-16 Thread Mario Kevo
Hi Anil,

From server logs we see that have some threads stucked and continuosly get on 
server2 the following message(bucket missing on server2 for DfSessions region):
[warn 2020/09/15 14:25:39.852 CEST  
tid=0x251] 15 secs have elapsed waiting for a primary for bucket [BucketAdvisor 
/__PR/_B__DfSessions_18:935: state=VOLUNTEERING_HOSTING]. Current bucket owners 
[]


And on the other server1:
[warn 2020/09/15 14:25:40.852 CEST  tid=0xdf] 
15 seconds have elapsed while waiting for replies: 
:41003]> on 
192.168.0.145(server1:28031):41002 whose current membership list is: 
[[192.168.0.145(locator1:27244:locator):41000, 
192.168.0.145(locator2:27343:locator):41001, 
192.168.0.145(server1:28031):41002, 192.168.0.145(server2:28054):41003]]

[warn 2020/09/15 14:27:20.200 CEST  tid=0x11] Thread 223 (0xdf) 
is stuck

[warn 2020/09/15 14:27:20.202 CEST  tid=0x11] Thread <223> 
(0xdf) that was executed at <15 Sep 2020 14:25:24 CEST> has been stuck for 
<115.361 seconds> and number of thread monitor iteration <1>
Thread Name  state 
...
It seems that this is not problem with stats.
We have a some suspicion that the problem is with some lock, but we need to 
investigate it a bit more.

BR,
Mario




Šalje: Anilkumar Gingade 
Poslano: 15. rujna 2020. 16:36
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Mario,

I doubt this has anything to do with the client connections. If it is it should 
be between server/member to server/member connection; in that case the 
unresponsive member is kicked out from the cluster.

The recommended configuration is to have persistence regions for both parent 
and co-located regions (and replicated regions)...

There could be issues in the stats too...Can you try executing a 
test/validation code on server side to dump/list primary and secondary buckets.
You can do that using helper methods: 
pr.getDataStore().getAllLocalPrimaryBucketIds();

-Anil

On 9/14/20, 12:25 AM, "Mario Kevo"  wrote:

Hi,


This problem is usually seen only on 1 server. The other servers metrics 
and bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic server if we have a client 
that tries to reconnect after the server restart. Clients simply get no 
response from the server so they try to close the connection, but the 
connection close is not acknowledged by the server. On server side we see that 
the connections are in CLOSE-WAIT state with packets in the socket receiver 
queue. It’s as if the servers just stopped processing packets on the sockets 
while waiting for a member with the primary bucket.



So in short, each new client connection is “unresponsive”. The client tries 
to close it a open a new one, but the socket doesn’t get closed on server side 
and the connection is left “hanging” on the server. Clients will try to do this 
until max-connections is reached on the servers. This is why we would be unable 
to add any data to the regions. But IMHO it’s really not dependent on adding 
data, since this issue happens occasionally (1 out of ~4 restarts) and only on 
one server.



The initial problem was observed with a persistent region A (with 1 
key-value pairs inserted) and a non-persistent region B collocated with region 
A. We did some tests with both regions being persistent. We haven’t observed 
the same issue yet (although we did only a few restarts), but we observed 
something that also looks quite worrying. Both servers start up without 
reporting issues in the logs. But, looking at the server metrics, one server 
has wrong information about “bucketCount” and is missing primary buckets. E.g:


First server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 113

| primaryBucketCount   | 57



Second server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 111

| primaryBucketCount   | 55


So we are missing a primary bucket without being aware of the issue.

BR,
Mario


Šalje: Anilkumar Gingade 
Poslano: 11. rujna 2020. 20:34
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Are you seeing no-buckets for persistent regions or non-persistent. The 
buckets are created dynamically; when data is added to corresponding buckets...
When server is restarted, in case of in-memory regions as the data is not 
there, the bucket region may not have been created (my suspicion).
Can you try adding data and see

Odg: Colocated regions missing some buckets after restart

2020-09-14 Thread Mario Kevo
Hi,


This problem is usually seen only on 1 server. The other servers metrics and 
bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic server if we have a client 
that tries to reconnect after the server restart. Clients simply get no 
response from the server so they try to close the connection, but the 
connection close is not acknowledged by the server. On server side we see that 
the connections are in CLOSE-WAIT state with packets in the socket receiver 
queue. It’s as if the servers just stopped processing packets on the sockets 
while waiting for a member with the primary bucket.



So in short, each new client connection is “unresponsive”. The client tries to 
close it a open a new one, but the socket doesn’t get closed on server side and 
the connection is left “hanging” on the server. Clients will try to do this 
until max-connections is reached on the servers. This is why we would be unable 
to add any data to the regions. But IMHO it’s really not dependent on adding 
data, since this issue happens occasionally (1 out of ~4 restarts) and only on 
one server.



The initial problem was observed with a persistent region A (with 1 
key-value pairs inserted) and a non-persistent region B collocated with region 
A. We did some tests with both regions being persistent. We haven’t observed 
the same issue yet (although we did only a few restarts), but we observed 
something that also looks quite worrying. Both servers start up without 
reporting issues in the logs. But, looking at the server metrics, one server 
has wrong information about “bucketCount” and is missing primary buckets. E.g:


First server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 113

| primaryBucketCount   | 57



Second server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 111

| primaryBucketCount   | 55


So we are missing a primary bucket without being aware of the issue.

BR,
Mario


Šalje: Anilkumar Gingade 
Poslano: 11. rujna 2020. 20:34
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Are you seeing no-buckets for persistent regions or non-persistent. The buckets 
are created dynamically; when data is added to corresponding buckets...
When server is restarted, in case of in-memory regions as the data is not 
there, the bucket region may not have been created (my suspicion).
Can you try adding data and see if the co-located bucket region gets created in 
respective nodes/server.

-Anil.


On 9/11/20, 9:46 AM, "Mario Kevo"  wrote:

Hi geode-dev,

We have a system with two servers and a few regions. One region is 
persistent and other are not but they are colocated with this persistent region.
After servers restart on some region we can see that they don't have any 
bucket.
gfsh>show metrics --member=server-1 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-1


Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 0
  | primaryBucketCount   | 0
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0

gfsh>show metrics --member=server-0 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-0

Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 113
  | primaryBucketCount   | 56
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0


The persistent region is ok, but some of these colocated regions has this 
issue. We also wait some time, but it doesn't change.

Does anyone have some idea about this problem, what causing the issue?
The issue can be easily reproduced with two locators, two servers, one 
persistent re

Colocated regions missing some buckets after restart

2020-09-11 Thread Mario Kevo
Hi geode-dev,

We have a system with two servers and a few regions. One region is persistent 
and other are not but they are colocated with this persistent region.
After servers restart on some region we can see that they don't have any bucket.
gfsh>show metrics --member=server-1 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-1


Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 0
  | primaryBucketCount   | 0
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0

gfsh>show metrics --member=server-0 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-0

Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 113
  | primaryBucketCount   | 56
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0


The persistent region is ok, but some of these colocated regions has this 
issue. We also wait some time, but it doesn't change.

Does anyone have some idea about this problem, what causing the issue?
The issue can be easily reproduced with two locators, two servers, one 
persistent region and few non-persistent regions colocated with persistent one.
After restart both servers and try to do show metrics command you will got this 
issue for some regions.

BR,
Mario



Odg: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create Issue" dialogue and include "Affects Version/s"

2020-08-17 Thread Mario Kevo
+1

Šalje: Dave Barnes 
Poslano: 18. kolovoza 2020. 7:23
Prima: dev@geode.apache.org 
Predmet: Re: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create 
Issue" dialogue and include "Affects Version/s"

+1 esp addition of "Affects Version/s".

On Mon, Aug 17, 2020 at 3:07 PM Kirk Lund  wrote:

> +1 if it's possible
>
> On Mon, Aug 17, 2020 at 12:04 PM Donal Evans  wrote:
>
> > Looking at the dialogue that opens when you attempt to create a new
> ticket
> > in the GEODE Jira[1], there are two fields included that aren't really
> > necessary and may cause confusion. The "Fix Version/s" field should
> > presumably not be filled out until the issue has actually been fixed,
> > rather than at the time of ticket creation. The "Sprint" field seems to
> no
> > longer serve any purpose at all that I can discern, having only been
> filled
> > in 13 tickets, the most recent of which was created in December 2018[2].
> > With the expansion of the community contributing to the Geode project,
> it's
> > important to provide a straightforward experience for people who are new
> to
> > the project and wish to file tickets, so the presence of these fields may
> > cause issues.
> >
> > I propose that these two fields be removed from the "Create Issue"
> > dialogue and that the "Affects Version/s" field be added, since that
> field
> > is far more important at time of ticket creation. There are currently
> 3851
> > bug tickets in the Jira with no "Affects Version/s" value entered at
> > all[3], which I suspect is in part due to that field not being an option
> in
> > the "Create Issue" dialogue, meaning you have to remember to go back
> after
> > creating the ticket and enter it. With Geode moving to a model of having
> > support branches and patch releases, properly capturing the versions
> > affected by a given issue becomes even more important.
> >
> > [1] https://i.imgur.com/oQ8CW87.png
> > [2]
> >
> https://issues.apache.org/jira/projects/GEODE/issues/GEODE-8433?filter=allissues=cf%5B12310921%5D+ASC%2C+created+DESC
> > [3]
> >
> https://issues.apache.org/jira/browse/GEODE-8433?jql=project%20%3D%20GEODE%20AND%20issuetype%20%3D%20Bug%20AND%20affectedVersion%20%3D%20EMPTY%20ORDER%20BY%20created%20DESC%2C%20affectedVersion%20ASC%2C%20cf%5B12310921%5D%20ASC
> >
>


Odg: Proposal to backport GEODE-8395 (gfsh help banner) to support branches

2020-08-03 Thread Mario Kevo
+1

Šalje: Dave Barnes 
Poslano: 3. kolovoza 2020. 16:56
Prima: dev@geode.apache.org 
Predmet: Re: Proposal to backport GEODE-8395 (gfsh help banner) to support 
branches

+1


On Sun, Aug 2, 2020 at 7:07 AM Donal Evans  wrote:

> +1
>
> Nice catch!
>
> Get Outlook for Android
>
> 
> From: Jinmei Liao 
> Sent: Saturday, August 1, 2020 11:09:45 PM
> To: dev@geode.apache.org 
> Subject: Re: Proposal to backport GEODE-8395 (gfsh help banner) to support
> branches
>
> +1
>
> On Aug 1, 2020 10:30 PM, Owen Nichols  wrote:
> This issue was present since Geode 1.0.  There is zero risk from this fix,
> which is already on develop.  It is critical because Geode releases are a
> legal Act of the Apache Foundation and should not misrepresent the identity
> of the project.
>
> Here’s the issue:
>
> $ gfsh
> _ __
>/ _/ __/ __/ // /
>   / /  __/ /___  /_  / _  /
> / /__/ / /  _/ / // /
> /__/_/  /__/_//_/1.12.0
>
> Monitor and Manage Apache Geode<-- right product name
>
> $gfsh --help
> Pivotal GemFire(R) v1.12.0 Command Line Shell   <-- WRONG product name
>
> The “right” instance above was fixed 5 years ago.  GEODE-8395 fixes gfsh
> to use the “right” code in both places.  Please vote +1 to backport this
> fix to support/1.13 and support/1.12.
>
>
>


Odg: negative ActiveCQCount

2020-07-23 Thread Mario Kevo
Hi,

Thanks for the response!

Decrementing is happening on both servers, I add some check to decrement on 
just one on which is incremented.
You can find changes on https://github.com/apache/geode/pull/5397.

BR,
Mario

Šalje: Anilkumar Gingade 
Poslano: 17. srpnja 2020. 21:19
Prima: dev@geode.apache.org 
Predmet: Re: negative ActiveCQCount

Mario,

Here is how the CQ register behaves:
When there is a single client and two servers.
When CQ is registered, with redundancy 0:
- On non-partitioned region, the CQ gets registered on one server, through 
registerCQ().
- On partitioned region, if the region is hosted on both server, the CQ gets 
registered on one server through registerCQ() and another through 
FilterProfile.process*()

In the code, I do see the stat for active CQ getting incremented correctly for 
both kind of registration.
Seems the decrementing is also happening, but need to verify. You can add logs 
at CqServiceVSDStats.inc/dec methods to see if they are happening.

If this is working as expected, then it could be related to how gfsh/mbean 
collecting the data and aggregating it.
Also, need to consider the cases where not all the nodes are cache servers.

-Anil.


On 7/17/20, 12:47 AM, "Mario Kevo"  wrote:

Hi devs,

Just reminder if someone is familiar with this, or someone has some idea 
how to resolve this issue.

Thanks and BR,
Mario

    Šalje: Mario Kevo 
Poslano: 7. srpnja 2020. 15:24
Prima: dev@geode.apache.org 
Predmet: Odg: Odg: negative ActiveCQCount

Hi,

Thank you all for the response!

What I got for now is that when I register CQ on the one server it 
processMessage to the other server through FilterProfile and in the message 
opType is REGISTER_CQ.
In fromData() method in FilterProfile.java states following:
if (isCqOp(this.opType)) {
this.serverCqName = in.readUTF();
if (this.opType == operationType.REGISTER_CQ || this.opType == 
operationType.SET_CQ_STATE) {
  this.cq = CqServiceProvider.readCq(in);
}
And there it register cq on the other server and not increment 
cqActiveCount, which is ok as redundancy is 0. But it now has on both server 
different instances of ServerCqImpl for the same cq. The ones created with 
constructor with arguments at the execute cq and another with empty constructor 
while deserializing the message with opType=REGISTER_CQ. For me this is ok as 
we need to follow up all changes on both servers as maybe some fullfil CQ 
condition on the other server. Correct me if I'm wrong.

But when it is going to close cq it executes it on both server, for me it 
is ok that what is started should be closed. But in the close method we have 
decrement if stateBeforeClosing is RUNNING. So it will be good if we can 
somehow process cq_state of this ServerCqImpl instance which is created by 
constructor with parameters before closing this created by deserialization.
Does anyone has an idea how to get this? Or some other idea to solve this 
issue?

BR,
Mario


Šalje: Kirk Lund 
Poslano: 1. srpnja 2020. 19:52
Prima: dev@geode.apache.org 
Predmet: Re: Odg: negative ActiveCQCount

Yeah, https://issues.apache.org/jira/browse/GEODE-8293 sounds like a
statistic decrement bug for activeCqCount. Somewhere, each Server is
decrementing it once too many times.

You could find the statistics class containing activeCqCount and try adding
some debugging log statements or even add some breakpoints for debugger if
it's easily reproduced.

On Wed, Jul 1, 2020 at 5:52 AM Mario Kevo  wrote:

> Hi Kirk, thanks for the response!
>
> I just realized that I wrongly describe the problem as I tried so many
> case. Sorry!
>
> We have system with two servers. If the redundancy is 0 then we have
> properly that on the first server is activeCqCount=1 and on the second is
> activeCqCount=0.
> After close CQ we got on first server activeCqCount=0 and on the second is
> activeCqCount=-1.
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
>
>
> In case we set redundancy to 1 it increments properly as expected, on both
> servers by one. But when cq is closed we got on both servers
> activeCqCount=-1. And show metrics command has the following output
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
&g

Odg: Odg: negative ActiveCQCount

2020-07-17 Thread Mario Kevo
Hi devs,

Just reminder if someone is familiar with this, or someone has some idea how to 
resolve this issue.

Thanks and BR,
Mario

Šalje: Mario Kevo 
Poslano: 7. srpnja 2020. 15:24
Prima: dev@geode.apache.org 
Predmet: Odg: Odg: negative ActiveCQCount

Hi,

Thank you all for the response!

What I got for now is that when I register CQ on the one server it 
processMessage to the other server through FilterProfile and in the message 
opType is REGISTER_CQ.
In fromData() method in FilterProfile.java states following:
if (isCqOp(this.opType)) {
this.serverCqName = in.readUTF();
if (this.opType == operationType.REGISTER_CQ || this.opType == 
operationType.SET_CQ_STATE) {
  this.cq = CqServiceProvider.readCq(in);
}
And there it register cq on the other server and not increment cqActiveCount, 
which is ok as redundancy is 0. But it now has on both server different 
instances of ServerCqImpl for the same cq. The ones created with constructor 
with arguments at the execute cq and another with empty constructor while 
deserializing the message with opType=REGISTER_CQ. For me this is ok as we need 
to follow up all changes on both servers as maybe some fullfil CQ condition on 
the other server. Correct me if I'm wrong.

But when it is going to close cq it executes it on both server, for me it is ok 
that what is started should be closed. But in the close method we have 
decrement if stateBeforeClosing is RUNNING. So it will be good if we can 
somehow process cq_state of this ServerCqImpl instance which is created by 
constructor with parameters before closing this created by deserialization.
Does anyone has an idea how to get this? Or some other idea to solve this issue?

BR,
Mario


Šalje: Kirk Lund 
Poslano: 1. srpnja 2020. 19:52
Prima: dev@geode.apache.org 
Predmet: Re: Odg: negative ActiveCQCount

Yeah, https://issues.apache.org/jira/browse/GEODE-8293 sounds like a
statistic decrement bug for activeCqCount. Somewhere, each Server is
decrementing it once too many times.

You could find the statistics class containing activeCqCount and try adding
some debugging log statements or even add some breakpoints for debugger if
it's easily reproduced.

On Wed, Jul 1, 2020 at 5:52 AM Mario Kevo  wrote:

> Hi Kirk, thanks for the response!
>
> I just realized that I wrongly describe the problem as I tried so many
> case. Sorry!
>
> We have system with two servers. If the redundancy is 0 then we have
> properly that on the first server is activeCqCount=1 and on the second is
> activeCqCount=0.
> After close CQ we got on first server activeCqCount=0 and on the second is
> activeCqCount=-1.
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
>
>
> In case we set redundancy to 1 it increments properly as expected, on both
> servers by one. But when cq is closed we got on both servers
> activeCqCount=-1. And show metrics command has the following output
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
>
> What I found is that when server register cq on one server it send message
> to other servers in the system with opType=REGISTER_CQ and in that case it
> creates new instance of ServerCqImpl on second server(with empty
> constructor of ServerCqImpl). When we close CQ there is two different
> instances on servers and it closed both of them, but as they are in RUNNING
> state before closing, it decrements activeCqCount on both of them.
>
> BR,
> Mario
>
> 
> Šalje: Kirk Lund 
> Poslano: 30. lipnja 2020. 19:54
> Prima: dev@geode.apache.org 
> Predmet: Re: negative ActiveCQCount
>
> I think *show metrics --categories=query* is showing you the query stats
> from DistributedSystemMXBean (see
> ShowMetricsCommand#writeSystemWideMetricValues). DistributedSystemMXBean
> aggregates values across all members in the cluster, so I would have
> expected activeCQCount to initially show a value of 2 after you create a
> ServerCQImpl in 2 servers. Then after closing the CQ, it should drop to a
> value of 0.
>
> When you create a CQ on a Server, it should be reflected asynchronously on
> the CacheServerMXBean in that Server. Each Server has its own
> CacheServerMXBean. Over on the Locator (JMX Manager), the
> DistributedSystemMXBean aggregates the count of active CQs in
> ServerClusterStatsMonitor by invoking
> DistributedSystemBridge#updateCacheServer when the CacheServerMXBean state
> is federated to the Locator (

Odg: Odg: negative ActiveCQCount

2020-07-07 Thread Mario Kevo
Hi,

Thank you all for the response!

What I got for now is that when I register CQ on the one server it 
processMessage to the other server through FilterProfile and in the message 
opType is REGISTER_CQ.
In fromData() method in FilterProfile.java states following:
if (isCqOp(this.opType)) {
this.serverCqName = in.readUTF();
if (this.opType == operationType.REGISTER_CQ || this.opType == 
operationType.SET_CQ_STATE) {
  this.cq = CqServiceProvider.readCq(in);
}
And there it register cq on the other server and not increment cqActiveCount, 
which is ok as redundancy is 0. But it now has on both server different 
instances of ServerCqImpl for the same cq. The ones created with constructor 
with arguments at the execute cq and another with empty constructor while 
deserializing the message with opType=REGISTER_CQ. For me this is ok as we need 
to follow up all changes on both servers as maybe some fullfil CQ condition on 
the other server. Correct me if I'm wrong.

But when it is going to close cq it executes it on both server, for me it is ok 
that what is started should be closed. But in the close method we have 
decrement if stateBeforeClosing is RUNNING. So it will be good if we can 
somehow process cq_state of this ServerCqImpl instance which is created by 
constructor with parameters before closing this created by deserialization.
Does anyone has an idea how to get this? Or some other idea to solve this issue?

BR,
Mario


Šalje: Kirk Lund 
Poslano: 1. srpnja 2020. 19:52
Prima: dev@geode.apache.org 
Predmet: Re: Odg: negative ActiveCQCount

Yeah, https://issues.apache.org/jira/browse/GEODE-8293 sounds like a
statistic decrement bug for activeCqCount. Somewhere, each Server is
decrementing it once too many times.

You could find the statistics class containing activeCqCount and try adding
some debugging log statements or even add some breakpoints for debugger if
it's easily reproduced.

On Wed, Jul 1, 2020 at 5:52 AM Mario Kevo  wrote:

> Hi Kirk, thanks for the response!
>
> I just realized that I wrongly describe the problem as I tried so many
> case. Sorry!
>
> We have system with two servers. If the redundancy is 0 then we have
> properly that on the first server is activeCqCount=1 and on the second is
> activeCqCount=0.
> After close CQ we got on first server activeCqCount=0 and on the second is
> activeCqCount=-1.
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
>
>
> In case we set redundancy to 1 it increments properly as expected, on both
> servers by one. But when cq is closed we got on both servers
> activeCqCount=-1. And show metrics command has the following output
> gfsh>show metrics --categories=query
> Cluster-wide Metrics
>
> Category |  Metric  | Value
>  |  | -
> query| activeCQCount| -1
>  | queryRequestRate | 0.0
>
> What I found is that when server register cq on one server it send message
> to other servers in the system with opType=REGISTER_CQ and in that case it
> creates new instance of ServerCqImpl on second server(with empty
> constructor of ServerCqImpl). When we close CQ there is two different
> instances on servers and it closed both of them, but as they are in RUNNING
> state before closing, it decrements activeCqCount on both of them.
>
> BR,
> Mario
>
> 
> Šalje: Kirk Lund 
> Poslano: 30. lipnja 2020. 19:54
> Prima: dev@geode.apache.org 
> Predmet: Re: negative ActiveCQCount
>
> I think *show metrics --categories=query* is showing you the query stats
> from DistributedSystemMXBean (see
> ShowMetricsCommand#writeSystemWideMetricValues). DistributedSystemMXBean
> aggregates values across all members in the cluster, so I would have
> expected activeCQCount to initially show a value of 2 after you create a
> ServerCQImpl in 2 servers. Then after closing the CQ, it should drop to a
> value of 0.
>
> When you create a CQ on a Server, it should be reflected asynchronously on
> the CacheServerMXBean in that Server. Each Server has its own
> CacheServerMXBean. Over on the Locator (JMX Manager), the
> DistributedSystemMXBean aggregates the count of active CQs in
> ServerClusterStatsMonitor by invoking
> DistributedSystemBridge#updateCacheServer when the CacheServerMXBean state
> is federated to the Locator (JMX Manager).
>
> Based on what I see in code and in the description on GEODE-8293, I think
> you might want to see if increment has a problem instead of decrement.
>
> I don't see anything that would limit the activeCQCount to only count the
> CQs on primaries. So,

Odg: Back-Port GEODE-8240 to 1.12, 1.13

2020-07-01 Thread Mario Kevo
+1

Šalje: Kirk Lund 
Poslano: 1. srpnja 2020. 19:54
Prima: dev@geode.apache.org 
Predmet: Re: Back-Port GEODE-8240 to 1.12, 1.13

+1

On Wed, Jul 1, 2020 at 9:59 AM Dick Cavender  wrote:

> +1
>
> -Original Message-
> From: Bruce Schuchardt 
> Sent: Wednesday, July 1, 2020 9:49 AM
> To: dev@geode.apache.org
> Subject: Re: Back-Port GEODE-8240 to 1.12, 1.13
>
> +1
>
> On 7/1/20, 9:43 AM, "Bill Burcham"  wrote:
>
> I'd like permission to back-port the fix for rolling upgrade bug
> GEODE-8240
> to support/1.12 and support/1.13
>
> -Bill
>
>


Odg: negative ActiveCQCount

2020-07-01 Thread Mario Kevo
Hi Kirk, thanks for the response!

I just realized that I wrongly describe the problem as I tried so many case. 
Sorry!

We have system with two servers. If the redundancy is 0 then we have properly 
that on the first server is activeCqCount=1 and on the second is 
activeCqCount=0.
After close CQ we got on first server activeCqCount=0 and on the second is 
activeCqCount=-1.
gfsh>show metrics --categories=query
Cluster-wide Metrics

Category |  Metric  | Value
 |  | -
query| activeCQCount| -1
 | queryRequestRate | 0.0


In case we set redundancy to 1 it increments properly as expected, on both 
servers by one. But when cq is closed we got on both servers activeCqCount=-1. 
And show metrics command has the following output
gfsh>show metrics --categories=query
Cluster-wide Metrics

Category |  Metric  | Value
 |  | -
query| activeCQCount| -1
 | queryRequestRate | 0.0

What I found is that when server register cq on one server it send message to 
other servers in the system with opType=REGISTER_CQ and in that case it creates 
new instance of ServerCqImpl on second server(with empty constructor of 
ServerCqImpl). When we close CQ there is two different instances on servers and 
it closed both of them, but as they are in RUNNING state before closing, it 
decrements activeCqCount on both of them.

BR,
Mario


Šalje: Kirk Lund 
Poslano: 30. lipnja 2020. 19:54
Prima: dev@geode.apache.org 
Predmet: Re: negative ActiveCQCount

I think *show metrics --categories=query* is showing you the query stats
from DistributedSystemMXBean (see
ShowMetricsCommand#writeSystemWideMetricValues). DistributedSystemMXBean
aggregates values across all members in the cluster, so I would have
expected activeCQCount to initially show a value of 2 after you create a
ServerCQImpl in 2 servers. Then after closing the CQ, it should drop to a
value of 0.

When you create a CQ on a Server, it should be reflected asynchronously on
the CacheServerMXBean in that Server. Each Server has its own
CacheServerMXBean. Over on the Locator (JMX Manager), the
DistributedSystemMXBean aggregates the count of active CQs in
ServerClusterStatsMonitor by invoking
DistributedSystemBridge#updateCacheServer when the CacheServerMXBean state
is federated to the Locator (JMX Manager).

Based on what I see in code and in the description on GEODE-8293, I think
you might want to see if increment has a problem instead of decrement.

I don't see anything that would limit the activeCQCount to only count the
CQs on primaries. So, I would expect redundancy=1 to result in a value of
2. Does anyone else have different info about this?

On Tue, Jun 30, 2020 at 5:31 AM Mario Kevo  wrote:

> Hi geode-dev,
>
> I have a question about CQ(
> https://issues.apache.org/jira/browse/GEODE-8293).
> If we run CQ it register cq on one of the
> servers(setPoolSubscriptionRedundancy is 1) and increment activeCQCount.
> As I understand then it processInputBuffer to another server and there is
> deserialization of the message. In case if opType is REGISTER_CQ or
> SET_CQ_STATE it will call readCq from CqServiceProvider, at the end calls
> empty contructor ServerCQImpl which is used for deserialization.
>
> The problem is when we close CQ then it has ServerCqImpl reference on both
> servers, close them, and decrement on both of them. In that case we have
> negative value of activeCQCount in show metrics command.
>
> Does anyone knows how to get in close method which is the primary and only
> decrement on it?
> Any advice is welcome!
>
> BR,
> Mario
>


negative ActiveCQCount

2020-06-30 Thread Mario Kevo
Hi geode-dev,

I have a question about CQ(https://issues.apache.org/jira/browse/GEODE-8293).
If we run CQ it register cq on one of the servers(setPoolSubscriptionRedundancy 
is 1) and increment activeCQCount.
As I understand then it processInputBuffer to another server and there is 
deserialization of the message. In case if opType is REGISTER_CQ or 
SET_CQ_STATE it will call readCq from CqServiceProvider, at the end calls empty 
contructor ServerCQImpl which is used for deserialization.

The problem is when we close CQ then it has ServerCqImpl reference on both 
servers, close them, and decrement on both of them. In that case we have 
negative value of activeCQCount in show metrics command.

Does anyone knows how to get in close method which is the primary and only 
decrement on it?
Any advice is welcome!

BR,
Mario


LGTM check failed

2020-05-29 Thread Mario Kevo
Hi all,

LGTM analysis: Java check failed for last six opened PRs.
https://github.com/apache/geode/pull/5182
https://github.com/apache/geode/pull/5181
https://github.com/apache/geode/pull/5180
https://github.com/apache/geode/pull/5179
https://github.com/apache/geode/pull/5176
https://github.com/apache/geode/pull/5175

[2020-05-29 00:08:56] [analysis] [EVALUATION 115/177] [FAIL] Error running 
query semmlecode-queries/Security/CWE/CWE-022/TaintedPath.ql: OutOfMemory
Query evaluation ran out of memory (maximum allowed memory: 3012MB).

I take a look on the last few merged commit but don't think that they caused 
this failure.
Please can someone, who is more familiar with this, take a look to see if it is 
problem with this check?

BR,
Mario



Odg: Certificate based authorization - CN authorization in jmx

2020-05-29 Thread Mario Kevo
Hi all,

Kindly reminder on this question.
Thanks in an advance!

BR,
Mario

Šalje: Mario Kevo 
Poslano: 22. svibnja 2020. 13:56
Prima: dev@geode.apache.org 
Predmet: Certificate based authorization - CN authorization in jmx

Hi geode-dev,

We are working on implementing a new feature regarding to this 
RFC<https://cwiki.apache.org/confluence/display/GEODE/Certificate+Based+Authorization>.

The main idea is to combine the TLS and access control features, but to use the 
certificate subject common name for access control authentication/authorization 
instead of user credentials.
We need to get client certificate on the server side to extract common name 
from it. The problem is that gfsh client connects towards to jmx using RMI TCP 
connections. We have tried many things to get client certificate from 
established RMI Connection but unfortunately without success.

Did anyone have the similar problem and able to extract certificate from RMI 
Connection after TLS handshake has been completed?

BR,
Mario



Certificate based authorization - CN authorization in jmx

2020-05-22 Thread Mario Kevo
Hi geode-dev,

We are working on implementing a new feature regarding to this 
RFC.

The main idea is to combine the TLS and access control features, but to use the 
certificate subject common name for access control authentication/authorization 
instead of user credentials.
We need to get client certificate on the server side to extract common name 
from it. The problem is that gfsh client connects towards to jmx using RMI TCP 
connections. We have tried many things to get client certificate from 
established RMI Connection but unfortunately without success.

Did anyone have the similar problem and able to extract certificate from RMI 
Connection after TLS handshake has been completed?

BR,
Mario



Handling packet drop between sites

2020-04-28 Thread Mario Kevo
Hi geode-dev,

I have a question about how Geode handle when some packets from batch is 
dropped.
I create Geode WAN with two sites and established replication between them. 
Also modified iptables to drop all packets that comes to receiver port.
In that case I have that some threads are stucked. Seems like gw sender never 
received any response back.
[warn 2020/04/27 13:19:04.667 CEST  tid=0x11] Thread 128 (0x80) 
is stuck

[warn 2020/04/27 13:19:04.669 CEST  tid=0x11] Thread <128> 
(0x80) that was executed at <27 Apr 2020 13:18:13 CEST> has been stuck for 
<50.997 seconds> and number of thread monitor iteration <1>
Thread Name  state 
Executor Group 
Monitored metric 
Thread stack:
java.net.PlainSocketImpl.socketConnect(Native Method)
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
java.net.Socket.connect(Socket.java:607)
org.apache.geode.distributed.internal.tcpserver.AdvancedSocketCreatorImpl.connect(AdvancedSocketCreatorImpl.java:102)
org.apache.geode.internal.net.SCAdvancedSocketCreator.connect(SCAdvancedSocketCreator.java:51)
org.apache.geode.distributed.internal.tcpserver.TcpSocketCreatorImpl.connect(TcpSocketCreatorImpl.java:59)
org.apache.geode.distributed.internal.tcpserver.ClientSocketCreatorImpl.connect(ClientSocketCreatorImpl.java:54)
org.apache.geode.cache.client.internal.ConnectionImpl.connect(ConnectionImpl.java:94)
org.apache.geode.cache.client.internal.ConnectionConnector.connectClientToServer(ConnectionConnector.java:75)
org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:118)
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:206)
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.forceCreateConnection(ConnectionManagerImpl.java:216)
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:326)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:329)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303)
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839)
org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:36)
org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:276)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Also, I tried to run the same test with 200K entries and drop 70% of packets 
and see that exception is again there and it takes approx. 40min to transmit 
all entries to another site.

How Geode handle dropping some packets from the batch? Does anyone made some 
tests on this behavior?

Thanks,
Mario



Re: Reconfiguring our notifications and more

2020-04-21 Thread Mario Kevo
+1

Šalje: Dan Smith 
Poslano: 21. travnja 2020. 18:01
Prima: dev@geode.apache.org 
Predmet: Re: Reconfiguring our notifications and more

+1

-Dan

On Tue, Apr 21, 2020 at 9:00 AM Owen Nichols  wrote:

> +1
>
> > On Apr 21, 2020, at 8:54 AM, Anthony Baker  wrote:
> >
> > I’d like a quick round of feedback so we can stop the dev@ spamming [1].
> >
> > ASF has implemented a cool way to give projects control of lots of
> things [2].  In short, you provide a .asf.yml in each and every GitHub repo
> to control lots of interesting things.  For us the most immediately
> relevant are:
> >
> > notifications:
> >  commits: comm...@geode.apache.org
> >  issues:  iss...@geode.apache.org
> >  pullrequests:  notificati...@geode.apache.org
> >  jira_options: link label comment
> >
> > I’d like to commit this to /develop and cherry-pick over to /master
> ASAP.  Later on we can explore the website and GitHub sections.  Let me
> know what you think.
> >
> > Anthony
> >
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-20156
> > [2]
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Notificationsettingsforrepositories
>
>


Review for PR #4815

2020-04-08 Thread Mario Kevo
Hi all,

please could someone review PR #4815.
Jira ticket https://issues.apache.org/jira/browse/GEODE-7838

Thanks and BR,
Mario


JGroups vulnerabilty

2020-04-07 Thread Mario Kevo
Hi,


I was trying to understand whether Geode is impacted by a security 
vulnerability reported on JGroups 
(CVE-2016-2141). The 
vulnerability is related to member authentication and communication encryption. 
What I could learn from this 
RFC
 is that geode doesn’t utilize the JGroups membership system, but only the UDP 
messaging, on top of which a custom encryption system is implemented.



>From this I would say that the reported vulnerability doesn’t really apply to 
>Geode. Nevertheless, I wanted to double-check this.


BR,

Mario



Odg: Odg: Certificate Based Authorization

2020-04-03 Thread Mario Kevo
Hi,

First of all, sorry Jens, I somehow miss your last mail. 

Regarding add new parameter to activate this feature to avoid confusion. It is 
a good option, we can go with that.
Also it will be easier to implement if we require ssl-enabled-components=all.

As end date for this RFC passed a long time ago, I will move it to next phase.
If someone has some comments or advices please feel free to add it here or on 
the RFC.

Thank you all,
Mario


Šalje: Jens Deppe 
Poslano: 6. prosinca 2019. 18:06
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Certificate Based Authorization

Thanks for the write-up. I think it does require a bit of clarification
around how the functionality is enabled.

You've stated:

For client connections, we could presume that certificate based
> authorization should be used if both features are enabled, but the client
> cache properties don’t provide credentials
> (security-username/security-password).


Currently, the presence of any '*auth-init' parameters, does not
necessarily require setting *security-username/password* (although almost
all implementations of AuthInitialize probably do use them). So this
condition will not be sufficient to enable this new functionality.

Although we already have so many parameters I think that having an explicit
parameter, to enable this feature, will avoid any possible confusion.

I'm wondering whether, for an initial deliverable, we should require
*ssl-enabled-components=all*. This would not allow a mix of different forms
of authentication for different endpoints. Perhaps this might simplify the
implementation but would not preclude us from adding that capability in the
future.

--Jens

On Fri, Dec 6, 2019 at 1:13 AM Mario Kevo  wrote:

> Hi all,
>
> I wrote up a proposal for Certificate Based Authorization.
> Please review and comment on the below proposal.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/Certificate+Based+Authorization
>
> BR,
> Mario
> 
> Šalje: Udo Kohlmeyer 
> Poslano: 2. prosinca 2019. 20:10
> Prima: dev@geode.apache.org 
> Predmet: Re: Certificate Based Authorization
>
> +1
>
> On 12/2/19 1:29 AM, Mario Kevo wrote:
> > Hi,
> >
> >
> >
> > There is another potential functionality we would like to discuss and
> get some comments for. The idea is TLS certificate based authorization.
> Currently, if a user wants secure communication (TLS) + authorization, he
> needs to enable TLS and access control. The user also needs to handle both
> the certificates for TLS and the credentials for access control. The idea
> we have is to use both features: TLS and access control, but remove the
> need to handle the credentials (generating and securely storing the
> username and password). Instead of the credentials, the certificate subject
> DN would be used for authorization.
> >
> >
> >
> > This would of course be optional. We would leave the possibility to use
> these 2 features as they are right now, but would also provide a
> configuration option to use the features without the need for client
> credentials, utilizing the certificate information instead.
> >
> >
> >
> > For further clarity, here are the descriptions of how the options would
> work:
> >
> >
> >
> >1.  Using TLS and access control as they work right now
> >   *   Certificates are prepared for TLS
> >   *   A SecurityManager is prepared for access control
> authentication/authorization. As part of this, a file (e.g. security.json)
> is prepared where we define the allowed usernames, passwords and
> authorization rights for each username
> >   *   The credentials are distributed towards clients. Here a user
> needs to consider secure distribution and periodical rotation of
> credentials.
> >
> > Once a client initiates a connection, we first get the TLS layer and
> certificate check, and right after that we perform the
> authentication/authorization of the user credentials.
> >
> >
> >
> >1.  TLS certificate based authorization
> >   *   Certificates are prepared for TLS
> >   *   A SecurityManager is prepared for access control
> authentication/authorization. As part of this, a file (e.g. security.json)
> is prepared. In this case we don’t define the authorization rights based on
> usernames/passwords but based on certificate subject DNs.
> >   *   There is no more need to distribute or periodically rotate the
> credentials, since there would be none. Authorization would be based  on
> the subject DN fetched from the certificate used for that same connection
> >
> > Once a client initiates a connection, and when we get past the TLS
> lay

Odg: Next release

2020-03-17 Thread Mario Kevo
Thanks for the info!

BR,
Mario

Šalje: Ernest Burghardt 
Poslano: 17. ožujka 2020. 0:06
Prima: dev@geode.apache.org 
Predmet: Re: Next release

Hi Mario,

There is still some work to be done to ensure performance is on par with
previous releases... here are a few tickets related to the efforts

https://issues.apache.org/jira/browse/GEODE-7763
https://issues.apache.org/jira/browse/GEODE-7832
https://issues.apache.org/jira/browse/GEODE-7853
https://issues.apache.org/jira/browse/GEODE-7863
https://issues.apache.org/jira/browse/GEODE-6154

Best regards,
EB

On Mon, Mar 16, 2020 at 2:03 AM Mario Kevo  wrote:

> Hi geode-dev,
>
> When we will have Apache Geode 1.12.0 release?
> I saw that all test passed and the last commit was before 4 days(typos
> correct).
> Are we waiting fix for some critical issue or something else?
>
> Thanks and BR,
> Mario
>


Next release

2020-03-16 Thread Mario Kevo
Hi geode-dev,

When we will have Apache Geode 1.12.0 release?
I saw that all test passed and the last commit was before 4 days(typos correct).
Are we waiting fix for some critical issue or something else?

Thanks and BR,
Mario


Odg: StressNewTest timeout exceeded

2020-02-05 Thread Mario Kevo
Thanks Owen Nichols<mailto:onich...@pivotal.io>!

Šalje: Robert Houghton 
Poslano: 5. veljače 2020. 16:30
Prima: dev@geode.apache.org 
Predmet: Re: StressNewTest timeout exceeded

How did you change its timeout?

On Wed, Feb 5, 2020, 07:27 Owen Nichols  wrote:

> Hi Mario, I’ve re-triggered your PR check with a longer timeout, since I
> think you are only about an hour over the current timeout of 6 hours for
> StressNewTest.
>
> > On Feb 5, 2020, at 5:10 AM, Mario Kevo  wrote:
> >
> > Hi all,
> >
> > I'm working on https://issues.apache.org/jira/browse/GEODE-7309
> > PR: https://github.com/apache/geode/pull/4395
> > All tests passed except stressNewTest which failed with timeout exceeded.
> >
> > Can we somehow override it to pass this tests?
> >
> > BR,
> > Mario
>
>


StressNewTest timeout exceeded

2020-02-05 Thread Mario Kevo
Hi all,

I'm working on https://issues.apache.org/jira/browse/GEODE-7309
PR: https://github.com/apache/geode/pull/4395
All tests passed except stressNewTest which failed with timeout exceeded.

Can we somehow override it to pass this tests?

BR,
Mario


Odg: Odg: disable statistic archival

2020-01-23 Thread Mario Kevo
Hi Kirk,

In DistributionConfig we have setStatisticArchiveFile() which is implemented in 
DistributionConfigImpl  and RuntimeDistributionConfigImpl. There you can see 
when it read properties from properties file if statistic-archive-file is null 
then it writes an empty file:

public void setStatisticArchiveFile(File value) {
   if (value == null) {
value = new File("");
  }
  statisticArchiveFile = value;
}

But in the case we change it from gfsh with alter runtime command it go through 
all parameters(which can be changed by this command) and check if it is 
different than null, in the case it is different than null and not empty it 
updates properties file. If we set --statistic-archive-file="" it will see that 
it is empty and then we got "Please provide a relevant parameter(s)" as it 
reads like it has no parameters.

BR,
Mario

Šalje: Kirk Lund 
Poslano: 23. siječnja 2020. 18:55
Prima: geode 
Predmet: Re: Odg: disable statistic archival

I hadn't thought of this usage before, but it makes sense. If you want
stats showing up on MBeans and/or in Micrometer, then setting
--statistic-archive-file="" is probably the correct way to do it. It's
possible that ServerLauncher or something deeper is
setting --statistic-archive-file to a non-empty value if it's empty. I
think you probably need to find that code and then review it to see how it
needs to change. If you find it, point me at it and I'll help review it.

On Wed, Jan 22, 2020 at 11:36 PM Mario Kevo  wrote:

> @Kirk<mailto:kl...@apache.org>, in case we change --enable-statistic(this
> flag refer to STATISTIC_SAMPLING_ENABLED ) to false we lose some metrics in
> gfsh and pulse.
> What if we want to keep statistic on, but not archive it to file.
> I don't see now how to disable archiving statistics to file, but keep
> statistics in gfsh and pulse.
>
> Yes, the user guide could be clearer.
>
>
> 
> Šalje: Kirk Lund 
> Poslano: 22. siječnja 2020. 20:44
> Prima: geode 
> Predmet: Re: disable statistic archival
>
> Dave, In your list, I think "enable-statistics" should be
> "enable-time-statistics". But, yes you're right!
>
> On Wed, Jan 22, 2020 at 9:46 AM Dave Barnes  wrote:
>
> > I'm getting the impression that the user guide could be clearer with
> regard
> > to the interactions between
> >
> >- enable-statistics
> >- statistic-sampling-enabled
> >- statistic-archive-file
> >
> >
> >
> > On Wed, Jan 22, 2020 at 9:30 AM Kirk Lund  wrote:
> >
> > > Try setting STATISTIC_SAMPLING_ENABLED to false to disable statistic
> > > sampling.
> > >
> > > I think we should delete "An empty string (default) disables statistic
> > > archival." from the javadocs for STATISTIC_ARCHIVE_FILE to avoid
> > confusion
> > > and redundancy with STATISTIC_SAMPLING_ENABLED.
> > >
> > > See below for the javadocs on both properties.
> > >
> > >   /**
> > >* The static String definition of the
> "statistic-archive-file"
> > > property  > >* name="statistic-archive-file"/a>
> > >* 
> > >* Description: The file that statistic samples are written
> to.
> > An
> > > empty string (default)
> > >* disables statistic archival.
> > >* 
> > >* Default: ""
> > >*/
> > >   String STATISTIC_ARCHIVE_FILE = "statistic-archive-file";
> > >
> > >   /**
> > >* The static String definition of the
> > > "statistic-sampling-enabled" property  > >* name="statistic-sampling-enabled"/a>
> > >* 
> > >* Description: "true" causes the statistics to be sampled
> > > periodically and operating
> > >* system statistics to be fetched each time a sample is taken.
> "false"
> > > disables sampling which
> > >* also disables operating system statistic collection. Non OS
> > statistics
> > > will still be recorded
> > >* in memory and can be viewed by administration tools. However,
> charts
> > > will show no activity and
> > >* no statistics will be archived while sampling is disabled.
> Starting
> > in
> > > 7.0 the default value
> > >* has been changed to true. If statistic sampling is disabled it
> will
> > > also cause various metrics
> > >* seen in gfsh and pulse to always be zero.
> > >* 
> > >* Default: "true"
> > >* 
> > 

Odg: Odg: disable statistic archival

2020-01-23 Thread Mario Kevo
Yes, it works with redirecting to /dev/null.
Do we need to document to use this to disable file, or just use this as a 
workaround and continue with adding new flag?


Šalje: Owen Nichols 
Poslano: 23. siječnja 2020. 8:49
Prima: dev@geode.apache.org 
Predmet: Re: Odg: disable statistic archival

Would it work to archive them to file /dev/null

On Wed, Jan 22, 2020 at 11:36 PM Mario Kevo  wrote:

> @Kirk<mailto:kl...@apache.org>, in case we change --enable-statistic(this
> flag refer to STATISTIC_SAMPLING_ENABLED ) to false we lose some metrics in
> gfsh and pulse.
> What if we want to keep statistic on, but not archive it to file.
> I don't see now how to disable archiving statistics to file, but keep
> statistics in gfsh and pulse.
>
> Yes, the user guide could be clearer.
>
>
> 
> Šalje: Kirk Lund 
> Poslano: 22. siječnja 2020. 20:44
> Prima: geode 
> Predmet: Re: disable statistic archival
>
> Dave, In your list, I think "enable-statistics" should be
> "enable-time-statistics". But, yes you're right!
>
> On Wed, Jan 22, 2020 at 9:46 AM Dave Barnes  wrote:
>
> > I'm getting the impression that the user guide could be clearer with
> regard
> > to the interactions between
> >
> >- enable-statistics
> >- statistic-sampling-enabled
> >- statistic-archive-file
> >
> >
> >
> > On Wed, Jan 22, 2020 at 9:30 AM Kirk Lund  wrote:
> >
> > > Try setting STATISTIC_SAMPLING_ENABLED to false to disable statistic
> > > sampling.
> > >
> > > I think we should delete "An empty string (default) disables statistic
> > > archival." from the javadocs for STATISTIC_ARCHIVE_FILE to avoid
> > confusion
> > > and redundancy with STATISTIC_SAMPLING_ENABLED.
> > >
> > > See below for the javadocs on both properties.
> > >
> > >   /**
> > >* The static String definition of the
> "statistic-archive-file"
> > > property  > >* name="statistic-archive-file"/a>
> > >* 
> > >* Description: The file that statistic samples are written
> to.
> > An
> > > empty string (default)
> > >* disables statistic archival.
> > >* 
> > >* Default: ""
> > >*/
> > >   String STATISTIC_ARCHIVE_FILE = "statistic-archive-file";
> > >
> > >   /**
> > >* The static String definition of the
> > > "statistic-sampling-enabled" property  > >* name="statistic-sampling-enabled"/a>
> > >* 
> > >* Description: "true" causes the statistics to be sampled
> > > periodically and operating
> > >* system statistics to be fetched each time a sample is taken.
> "false"
> > > disables sampling which
> > >* also disables operating system statistic collection. Non OS
> > statistics
> > > will still be recorded
> > >* in memory and can be viewed by administration tools. However,
> charts
> > > will show no activity and
> > >* no statistics will be archived while sampling is disabled.
> Starting
> > in
> > > 7.0 the default value
> > >* has been changed to true. If statistic sampling is disabled it
> will
> > > also cause various metrics
> > >* seen in gfsh and pulse to always be zero.
> > >* 
> > >* Default: "true"
> > >* 
> > >* Allowed values: true|false
> > >*/
> > >   String STATISTIC_SAMPLING_ENABLED = "statistic-sampling-enabled";
> > >
> > > On Tue, Jan 21, 2020 at 1:06 AM Mario Kevo 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > We are trying to disable archiving statistic in the file by providing
> > > > empty string to --statistic-archive-file. This option doesn't work.
> > > > From the documentation it should work:
> > > > The file to which the running system member writes statistic samples.
> > For
> > > > example: “StatisticsArchiveFile.gfs”. An empty string disables
> > archiving.
> > > > I opened ticket(GEODE-7714<
> > > > https://issues.apache.org/jira/browse/GEODE-7714>) and try to fix
> it,
> > > but
> > > > without success.
> > > >
> > > > As alter runtime command update properties and cache, it checks if
> any
> > of
> > > > these parameters change, but if we set this property to an empty
> string
> > > it
> > > > failed with message
> > > > Please provide a relevant parameter(s).
> > > > We can change for this parameter that it can be an empty string but
> how
> > > > this command works, it goes over all parameters and checks if it is
> > > > changed. In that case if we provide something like
> > > > alter runtime --member=server it will be successiful but shouldn't as
> > we
> > > > didn't provide any parameter.
> > > >
> > > > So the proposal is that we need to add a new parameter called
> > > > --statistic-archiving-enabled  which can be true or false. In case it
> > is
> > > > true we need to provide also --statistic-archive-file.
> > > >
> > > > Any thougths?
> > > >
> > > > BR,
> > > > Mario
> > > >
> > > >
> > >
> >
>


Odg: disable statistic archival

2020-01-22 Thread Mario Kevo
@Kirk<mailto:kl...@apache.org>, in case we change --enable-statistic(this flag 
refer to STATISTIC_SAMPLING_ENABLED ) to false we lose some metrics in gfsh and 
pulse.
What if we want to keep statistic on, but not archive it to file.
I don't see now how to disable archiving statistics to file, but keep 
statistics in gfsh and pulse.

Yes, the user guide could be clearer.



Šalje: Kirk Lund 
Poslano: 22. siječnja 2020. 20:44
Prima: geode 
Predmet: Re: disable statistic archival

Dave, In your list, I think "enable-statistics" should be
"enable-time-statistics". But, yes you're right!

On Wed, Jan 22, 2020 at 9:46 AM Dave Barnes  wrote:

> I'm getting the impression that the user guide could be clearer with regard
> to the interactions between
>
>- enable-statistics
>- statistic-sampling-enabled
>- statistic-archive-file
>
>
>
> On Wed, Jan 22, 2020 at 9:30 AM Kirk Lund  wrote:
>
> > Try setting STATISTIC_SAMPLING_ENABLED to false to disable statistic
> > sampling.
> >
> > I think we should delete "An empty string (default) disables statistic
> > archival." from the javadocs for STATISTIC_ARCHIVE_FILE to avoid
> confusion
> > and redundancy with STATISTIC_SAMPLING_ENABLED.
> >
> > See below for the javadocs on both properties.
> >
> >   /**
> >* The static String definition of the "statistic-archive-file"
> > property  >* name="statistic-archive-file"/a>
> >* 
> >* Description: The file that statistic samples are written to.
> An
> > empty string (default)
> >* disables statistic archival.
> >* 
> >* Default: ""
> >*/
> >   String STATISTIC_ARCHIVE_FILE = "statistic-archive-file";
> >
> >   /**
> >* The static String definition of the
> > "statistic-sampling-enabled" property  >* name="statistic-sampling-enabled"/a>
> >* 
> >* Description: "true" causes the statistics to be sampled
> > periodically and operating
> >* system statistics to be fetched each time a sample is taken. "false"
> > disables sampling which
> >* also disables operating system statistic collection. Non OS
> statistics
> > will still be recorded
> >* in memory and can be viewed by administration tools. However, charts
> > will show no activity and
> >* no statistics will be archived while sampling is disabled. Starting
> in
> > 7.0 the default value
> >* has been changed to true. If statistic sampling is disabled it will
> > also cause various metrics
> >* seen in gfsh and pulse to always be zero.
> >* 
> >* Default: "true"
> >* 
> >* Allowed values: true|false
> >*/
> >   String STATISTIC_SAMPLING_ENABLED = "statistic-sampling-enabled";
> >
> > On Tue, Jan 21, 2020 at 1:06 AM Mario Kevo  wrote:
> >
> > > Hi,
> > >
> > > We are trying to disable archiving statistic in the file by providing
> > > empty string to --statistic-archive-file. This option doesn't work.
> > > From the documentation it should work:
> > > The file to which the running system member writes statistic samples.
> For
> > > example: “StatisticsArchiveFile.gfs”. An empty string disables
> archiving.
> > > I opened ticket(GEODE-7714<
> > > https://issues.apache.org/jira/browse/GEODE-7714>) and try to fix it,
> > but
> > > without success.
> > >
> > > As alter runtime command update properties and cache, it checks if any
> of
> > > these parameters change, but if we set this property to an empty string
> > it
> > > failed with message
> > > Please provide a relevant parameter(s).
> > > We can change for this parameter that it can be an empty string but how
> > > this command works, it goes over all parameters and checks if it is
> > > changed. In that case if we provide something like
> > > alter runtime --member=server it will be successiful but shouldn't as
> we
> > > didn't provide any parameter.
> > >
> > > So the proposal is that we need to add a new parameter called
> > > --statistic-archiving-enabled  which can be true or false. In case it
> is
> > > true we need to provide also --statistic-archive-file.
> > >
> > > Any thougths?
> > >
> > > BR,
> > > Mario
> > >
> > >
> >
>


Odg: privacy protection

2020-01-21 Thread Mario Kevo
Hi,

Just kindly reminder on this.

BR,
Mario

Šalje: Mario Kevo 
Poslano: 14. siječnja 2020. 16:20
Prima: dev@geode.apache.org 
Predmet: privacy protection

Hi geode-dev,

Is it possible somehow to protect all files that containing user data(or user 
data itself) being stored in disk for Geode.
This includes all persistence data(OpLogs), backups and possible other files 
containing user data.
Also protection is needed for all of the files potentailly used for replication 
and cluster high availability mechanism.

If this feature is not available, do you have it in the plan already?
Do you included it in Geode roadmap?

BR,
Mario



disable statistic archival

2020-01-21 Thread Mario Kevo
Hi,

We are trying to disable archiving statistic in the file by providing empty 
string to --statistic-archive-file. This option doesn't work.
>From the documentation it should work:
The file to which the running system member writes statistic samples. For 
example: “StatisticsArchiveFile.gfs”. An empty string disables archiving.
I opened ticket(GEODE-7714) 
and try to fix it, but without success.

As alter runtime command update properties and cache, it checks if any of these 
parameters change, but if we set this property to an empty string it failed 
with message
Please provide a relevant parameter(s).
We can change for this parameter that it can be an empty string but how this 
command works, it goes over all parameters and checks if it is changed. In that 
case if we provide something like
alter runtime --member=server it will be successiful but shouldn't as we didn't 
provide any parameter.

So the proposal is that we need to add a new parameter called 
--statistic-archiving-enabled  which can be true or false. In case it is true 
we need to provide also --statistic-archive-file.

Any thougths?

BR,
Mario



Odg: [DISCUSS] stop releasing both .tar.gz AND .zip for geode-examples

2020-01-18 Thread Mario Kevo
+1 to remove geode-examples.zip

BR,
Mario

Šalje: Dan Smith 
Poslano: 17. siječnja 2020. 22:30
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] stop releasing both .tar.gz AND .zip for geode-examples

+1 to removing the .zip for geode-examples.

-Dan

On Fri, Jan 17, 2020 at 12:11 PM Owen Nichols  wrote:

> Geode dropped .zip starting with 1.8.0, so it is likely an oversight that
> Geode-examples still makes .zip artifacts too.
>
> It was requested[1] recently that we fix this.
>
> The proposed fix is available to review[2] along with accompanying changes
> to the release validation pipeline[3].
>
> Since this impacts releases, I am starting this discussion thread before
> going any further.
>
>
> [1]
> https://lists.apache.org/thread.html/c7bd84b6e6f5464ed674ed447fe8922097237932967b4ed1966e79d3%40%3Cdev.geode.apache.org%3E
> [2] https://github.com/apache/geode-examples/pull/91
> [3] https://github.com/apache/geode/pull/4606


Odg: GW sender dispatcher threads & order policy

2020-01-15 Thread Mario Kevo
Yes, sorry I re-read mail and see that it is not the same issue.
Here is how it is working now:


OrderPolicy cannot have an invalid string as it doesn't allow any string except 
this from the OrderPolicy.ENUM: KEY, THREAD and PARTITION.

  1.  If dispatcher threads is more than 1 you need to set order-policy to one 
of this OrderPolicy.ENUM. In case you set any other string you will get:
java.lang.IllegalArgumentException: Failed to convert 'XXX' to type OrderPolicy 
for option 'order-policy'
No enum constant org.apache.geode.cache.wan.GatewaySender.OrderPolicy.XXX
  2.  If dispatcher threads is equal 1 it doesn't need order-policy as it is 
alone there(You can set order-policy to some of this values, but they don't 
have any impact as it will not be used).
  3.  If it is lower than 1 you will get exception and message that dispatcher 
threads cannot be less than 1.

I guess that you refer to the documentation where is set that default value of 
order-policy is key, in that case the correction in documentation is needed or 
code change to always use default if it is not specified while creating gateway 
sender.

BR,
Mario

Šalje: Alberto Bustamante Reyes 
Poslano: 15. siječnja 2020. 18:52
Prima: Mario Kevo ; dev@geode.apache.org 

Predmet: RE: GW sender dispatcher threads & order policy

Hi Mario,

My code contains that fix, its not the same issue. GEODE-7561 solves the issue 
with the value "1" for dispatcher threads, but an explicit value for 
order-policy is still required if you specify a value for dispatcher threads.

BR/

Alberto B.
________
De: Mario Kevo 
Enviado: miércoles, 15 de enero de 2020 18:22
Para: Alberto Bustamante Reyes ; 
dev@geode.apache.org 
Asunto: Odg: GW sender dispatcher threads & order policy

Hi Alberto,

This is already solved in Geode 1.12.0.

https://issues.apache.org/jira/browse/GEODE-7561

BR,
Mario

Šalje: Alberto Bustamante Reyes 
Poslano: 15. siječnja 2020. 18:14
Prima: dev@geode.apache.org 
Predmet: GW sender dispatcher threads & order policy

Hi,

I have seen that if I change the default number of dispatcher threads ( 5 ) 
when creating a gateway sender, I get an error saying I must specify an order 
policy:

"Must specify --order-policy when --dispatcher-threads is larger than 1."

I find this odd, taking into account that the default value is already larger 
than 1 and order-policy has a default value. Actually, the error is shown if 
you specify "--dispatcher-threads=5". I was going to create a ticket to report 
this but I have a question: what is the use case for having a sender with less 
than 1 dispatcher thread?

Thanks!

BR/

Alberto B.


Odg: GW sender dispatcher threads & order policy

2020-01-15 Thread Mario Kevo
Hi Alberto,

This is already solved in Geode 1.12.0.

https://issues.apache.org/jira/browse/GEODE-7561

BR,
Mario

Šalje: Alberto Bustamante Reyes 
Poslano: 15. siječnja 2020. 18:14
Prima: dev@geode.apache.org 
Predmet: GW sender dispatcher threads & order policy

Hi,

I have seen that if I change the default number of dispatcher threads ( 5 ) 
when creating a gateway sender, I get an error saying I must specify an order 
policy:

"Must specify --order-policy when --dispatcher-threads is larger than 1."

I find this odd, taking into account that the default value is already larger 
than 1 and order-policy has a default value. Actually, the error is shown if 
you specify "--dispatcher-threads=5". I was going to create a ticket to report 
this but I have a question: what is the use case for having a sender with less 
than 1 dispatcher thread?

Thanks!

BR/

Alberto B.


privacy protection

2020-01-14 Thread Mario Kevo
Hi geode-dev,

Is it possible somehow to protect all files that containing user data(or user 
data itself) being stored in disk for Geode.
This includes all persistence data(OpLogs), backups and possible other files 
containing user data.
Also protection is needed for all of the files potentailly used for replication 
and cluster high availability mechanism.

If this feature is not available, do you have it in the plan already?
Do you included it in Geode roadmap?

BR,
Mario



Odg: enable-time-statistics

2020-01-12 Thread Mario Kevo
Thank you all for confirming this! 

Does anyone tried to enable it on geode native side?
We tried, but without success. Do we need to change anything else except 
enable-time-statistics to true as this not working?

BR,
Mario

Šalje: Charlie Black 
Poslano: 10. siječnja 2020. 23:00
Prima: dev@geode.apache.org 
Predmet: Re: enable-time-statistics

David,

I would say remove the caveat.Thanks for offering.

Charlie

On Fri, Jan 10, 2020 at 1:55 PM Dave Barnes  wrote:

> Sounds like the caveat could be dropped from the user guide. If we have
> consensus on that (am I understanding correctly?), I'll initiate a JIRA
> ticket.
>
> On Fri, Jan 10, 2020 at 1:47 PM Jacob Barrett  wrote:
>
> > The biggest impact was in recording all the additional stats in the old
> > blocking stats implementation. As of 9.8 the stats internals are mostly
> > non-blocking. Enabling time stats has very little of any impact now.
> >
> > > On Jan 10, 2020, at 12:45 PM, Dan Smith  wrote:
> > >
> > > I personally wouldn't be too worried about enabling time based
> > statistics
> > > in production. I think we segregated the time statistics because they
> do
> > > have to call System.nanoTime to measure the elapsed time. At one point
> in
> > > the history with old JDKs they called System.currentTimeMillis, which
> was
> > > really expensive. But now I'm not sure the nanoTime calls really have
> > that
> > > much of an impact compared to the rest of the processing time.
> > >
> > > -Dan
> > >
> > >
> > >> On Fri, Jan 10, 2020 at 11:25 AM Mario Kevo 
> > wrote:
> > >>
> > >> Hi geode-dev,
> > >>
> > >> We have executed some traffic against Geode servers with time-based
> > >> statistics enabled and disabled and we didn't see any performance
> > >> difference.
> > >> The documentation says:
> > >>
> > >>
> > >> If you need time-based statistics, enable that. Time-based statistics
> > >> require statistics sampling and archival. Example:
> > >>
> > >> statistic-sampling-enabled=true
> > >> statistic-archive-file=myStatisticsArchiveFile.gfs
> > >> enable-time-statistics=true
> > >>
> > >>
> > >> Note: Time-based statistics can impact system performance and is not
> > >> recommended for production environments.
> > >>
> > >>
> > >> Do you know on which part this note referring to?
> > >>
> > >>
> > >> Also we tried to enable time statistics on geode native but without
> > >> success.
> > >>
> > >> We change in geode.properties file this parameter to true but didn't
> get
> > >> any additional statistics in statistics archive file.
> > >>
> > >> Do we need also to change something else to enable it or this is not
> > >> working for geode-native?
> > >>
> > >>
> > >> BR,
> > >>
> > >> Mario
> > >>
> > >>
> >
>


--
Charlie Black | cbl...@pivotal.io


enable-time-statistics

2020-01-10 Thread Mario Kevo
Hi geode-dev,

We have executed some traffic against Geode servers with time-based statistics 
enabled and disabled and we didn't see any performance difference.
The documentation says:


If you need time-based statistics, enable that. Time-based statistics require 
statistics sampling and archival. Example:

statistic-sampling-enabled=true
statistic-archive-file=myStatisticsArchiveFile.gfs
enable-time-statistics=true


Note: Time-based statistics can impact system performance and is not 
recommended for production environments.


Do you know on which part this note referring to?


Also we tried to enable time statistics on geode native but without success.

We change in geode.properties file this parameter to true but didn't get any 
additional statistics in statistics archive file.

Do we need also to change something else to enable it or this is not working 
for geode-native?


BR,

Mario



Odg: RFC - Logging to Standard Out

2020-01-08 Thread Mario Kevo
+1

BR,
Mario


Šalje: Jacob Barrett 
Poslano: 8. siječnja 2020. 21:39
Prima: dev@geode.apache.org 
Predmet: RFC - Logging to Standard Out

Please see RFC for Logging to Standard Out.

https://cwiki.apache.org/confluence/display/GEODE/Logging+to+Standard+Out 


Please comment by 1/21/2020.

Thanks,
Jake



Odg: Odg: Odg: Odg: Odg: Lucene upgrade

2020-01-07 Thread Mario Kevo
Hi all,

Please could someone review #4395<https://github.com/apache/geode/pull/4395>.

BR,
Mario

Šalje: Mario Kevo 
Poslano: 17. prosinca 2019. 14:30
Prima: Jason Huynh 
Kopija: geode 
Predmet: Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it 
passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario

Šalje: Jason Huynh 
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit 
off ( it expected reindex features to be complete by a certain release).  I 
have a PR on develop to adjust that calculation 
(https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data 
already in it) is enabled - any query will now throw the 
LuceneIndexingInProgressException instead of possibly waiting a very long time 
to receive a query result.  The tests themselves are coded to retry 10 times, 
knowing it will take awhile to reindex.  If you bump this number up or, better 
yet, make it time based (awaitility, etc), it should get you past this problem 
(once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo  wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade f

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

2019-12-17 Thread Mario Kevo
Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it 
passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario

Šalje: Jason Huynh 
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit 
off ( it expected reindex features to be complete by a certain release).  I 
have a PR on develop to adjust that calculation 
(https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data 
already in it) is enabled - any query will now throw the 
LuceneIndexingInProgressException instead of possibly waiting a very long time 
to receive a query result.  The tests themselves are coded to retry 10 times, 
knowing it will take awhile to reindex.  If you bump this number up or, better 
yet, make it time based (awaitility, etc), it should get you past this problem 
(once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo  wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks

Odg: Odg: Odg: Odg: Lucene upgrade

2019-12-12 Thread Mario Kevo
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh 
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, 

Odg: Odg: Odg: Lucene upgrade

2019-12-11 Thread Mario Kevo
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh 
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario



Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same er

Odg: PRs review

2019-12-11 Thread Mario Kevo
Thank you @Bruce Schuchardt<mailto:bschucha...@pivotal.io>! 

Šalje: Bruce Schuchardt 
Poslano: 10. prosinca 2019. 23:53
Prima: dev@geode.apache.org 
Predmet: Re: PRs review

Mario, I've merged GEODE-6927.  You can close the JIRA ticket.

On 12/10/19 4:20 AM, Mario Kevo wrote:
> Hi Geode dev,
>
> Need some PR reviewers on the following PRs.
>
> JIRA: https://issues.apache.org/jira/browse/GEODE-6927
> PR: https://github.com/apache/geode/pull/4085
>
>
> JIRA: https://issues.apache.org/jira/browse/GEODE-7561
> PR: https://github.com/apache/geode/pull/4441
>
> BR,
> Mario
>


PRs review

2019-12-10 Thread Mario Kevo
Hi Geode dev,

Need some PR reviewers on the following PRs.

JIRA: https://issues.apache.org/jira/browse/GEODE-6927
PR: https://github.com/apache/geode/pull/4085


JIRA: https://issues.apache.org/jira/browse/GEODE-7561
PR: https://github.com/apache/geode/pull/4441

BR,
Mario


Odg: Certificate Based Authorization

2019-12-06 Thread Mario Kevo
Hi all,

I wrote up a proposal for Certificate Based Authorization.
Please review and comment on the below proposal.

https://cwiki.apache.org/confluence/display/GEODE/Certificate+Based+Authorization

BR,
Mario

Šalje: Udo Kohlmeyer 
Poslano: 2. prosinca 2019. 20:10
Prima: dev@geode.apache.org 
Predmet: Re: Certificate Based Authorization

+1

On 12/2/19 1:29 AM, Mario Kevo wrote:
> Hi,
>
>
>
> There is another potential functionality we would like to discuss and get 
> some comments for. The idea is TLS certificate based authorization. 
> Currently, if a user wants secure communication (TLS) + authorization, he 
> needs to enable TLS and access control. The user also needs to handle both 
> the certificates for TLS and the credentials for access control. The idea we 
> have is to use both features: TLS and access control, but remove the need to 
> handle the credentials (generating and securely storing the username and 
> password). Instead of the credentials, the certificate subject DN would be 
> used for authorization.
>
>
>
> This would of course be optional. We would leave the possibility to use these 
> 2 features as they are right now, but would also provide a configuration 
> option to use the features without the need for client credentials, utilizing 
> the certificate information instead.
>
>
>
> For further clarity, here are the descriptions of how the options would work:
>
>
>
>1.  Using TLS and access control as they work right now
>   *   Certificates are prepared for TLS
>   *   A SecurityManager is prepared for access control 
> authentication/authorization. As part of this, a file (e.g. security.json) is 
> prepared where we define the allowed usernames, passwords and authorization 
> rights for each username
>   *   The credentials are distributed towards clients. Here a user needs 
> to consider secure distribution and periodical rotation of credentials.
>
> Once a client initiates a connection, we first get the TLS layer and 
> certificate check, and right after that we perform the 
> authentication/authorization of the user credentials.
>
>
>
>1.  TLS certificate based authorization
>   *   Certificates are prepared for TLS
>   *   A SecurityManager is prepared for access control 
> authentication/authorization. As part of this, a file (e.g. security.json) is 
> prepared. In this case we don’t define the authorization rights based on 
> usernames/passwords but based on certificate subject DNs.
>   *   There is no more need to distribute or periodically rotate the 
> credentials, since there would be none. Authorization would be based  on the 
> subject DN fetched from the certificate used for that same connection
>
> Once a client initiates a connection, and when we get past the TLS layer, at 
> the moment where geode expects the credentials from the client connection, we 
> just take the certificate subject DN instead and provide it to the security 
> manager for authorization.
>
>
>
> This wouldn’t lower the level of security (we can have TLS enabled without 
> access control already), but would provide authentication without the hassle 
> of username and password handling.
>
>
>
> This is the basic description of the idea. There would be more things to 
> consider, like multi user authentication, but for now we would just like to 
> get some initial feedback. If it is considered useful, we could get into the 
> details.
>
>
> BR,
>
> Mario
>
>


Odg: Odg: Lucene upgrade

2019-12-06 Thread Mario Kevo
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario



Šalje: Jason Huynh 
Poslano: 2. prosinca 2019. 18:32
Prima: geode 
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> 
> Šalje: Xiaojian Zhou 
> Poslano: 7. studenog 2019. 18:27
> Prima: geode 
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou  wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will caus

Wiki access

2019-12-04 Thread Mario Kevo
Hi All,

Can I have access to edit the Geode Wiki?
My username is "mario.kevo".

Thanks,
Mario


Odg: Lucene upgrade

2019-12-02 Thread Mario Kevo

I started with implementation of Option-1.
As I understood the idea is to block all puts(put them in the queue) until all 
members are upgraded. After that it will process all queued events.

I tried with Dan's proposal to check on start of LuceneEventListener.process() 
if all members are upgraded, also changed test to verify lucene indexes only 
after all members are upgraded, but got the same error with incompatibilities 
between lucene versions.
Changes are visible on https://github.com/apache/geode/pull/4198.

Please add comments and suggestions.

BR,
Mario



Šalje: Xiaojian Zhou 
Poslano: 7. studenog 2019. 18:27
Prima: geode 
Predmet: Re: Lucene upgrade

Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.

On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:

> Gester, I don't think we need to write in the old format, we just need the
> new format not to be written while old members can potentially read the
> lucene files.  Option 1 can be very similar to Dan's snippet of code.
>
> I think Option 2 is going to leave a lot of people unhappy when they get
> stuck with what Mario is experiencing right now and all we can say is "you
> should have read the doc". Not to say Option 2 isn't valid and it's
> definitely the least amount of work to do, I still vote option 1.
>
> On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou  wrote:
>
> > Usually re-creating region and index are expensive and customers are
> > reluctant to do it, according to my memory.
> >
> > We do have an offline reindex scripts or steps (written by Barry?). If
> that
> > could be an option, they can try that offline tool.
> >
> > I saw from Mario's email, he said: "I didn't found a way to write lucene
> in
> > older format. They only support
> > reading old format indexes with newer version by using lucene-backward-
> > codec."
> >
> > That's why I think option-1 is not feasible.
> >
> > Option-2 will cause the queue to be filled. But usually customer will
> hold
> > on, silence or reduce their business throughput when
> > doing rolling upgrade. I wonder if it's a reasonable assumption.
> >
> > Overall, after compared all the 3 options, I still think option-2 is the
> > best bet.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett 
> wrote:
> >
> > >
> > >
> > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh  wrote:
> > > >
> > > > Jake - there is a side effect to this in that the user would have to
> > > > reimport all their data into the user defined region too.  Client
> apps
> > > > would also have to know which of the regions to put into.. also, I
> may
> > be
> > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > support
> > > > whoever implements the changes :-P
> > >
> > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> thought.
> > >
> > > -Jake
> > >
> > >
> >
>


Certificate Based Authorization

2019-12-02 Thread Mario Kevo
Hi,



There is another potential functionality we would like to discuss and get some 
comments for. The idea is TLS certificate based authorization. Currently, if a 
user wants secure communication (TLS) + authorization, he needs to enable TLS 
and access control. The user also needs to handle both the certificates for TLS 
and the credentials for access control. The idea we have is to use both 
features: TLS and access control, but remove the need to handle the credentials 
(generating and securely storing the username and password). Instead of the 
credentials, the certificate subject DN would be used for authorization.



This would of course be optional. We would leave the possibility to use these 2 
features as they are right now, but would also provide a configuration option 
to use the features without the need for client credentials, utilizing the 
certificate information instead.



For further clarity, here are the descriptions of how the options would work:



  1.  Using TLS and access control as they work right now
 *   Certificates are prepared for TLS
 *   A SecurityManager is prepared for access control 
authentication/authorization. As part of this, a file (e.g. security.json) is 
prepared where we define the allowed usernames, passwords and authorization 
rights for each username
 *   The credentials are distributed towards clients. Here a user needs to 
consider secure distribution and periodical rotation of credentials.

Once a client initiates a connection, we first get the TLS layer and 
certificate check, and right after that we perform the 
authentication/authorization of the user credentials.



  1.  TLS certificate based authorization
 *   Certificates are prepared for TLS
 *   A SecurityManager is prepared for access control 
authentication/authorization. As part of this, a file (e.g. security.json) is 
prepared. In this case we don’t define the authorization rights based on 
usernames/passwords but based on certificate subject DNs.
 *   There is no more need to distribute or periodically rotate the 
credentials, since there would be none. Authorization would be based  on the 
subject DN fetched from the certificate used for that same connection

Once a client initiates a connection, and when we get past the TLS layer, at 
the moment where geode expects the credentials from the client connection, we 
just take the certificate subject DN instead and provide it to the security 
manager for authorization.



This wouldn’t lower the level of security (we can have TLS enabled without 
access control already), but would provide authentication without the hassle of 
username and password handling.



This is the basic description of the idea. There would be more things to 
consider, like multi user authentication, but for now we would just like to get 
some initial feedback. If it is considered useful, we could get into the 
details.


BR,

Mario



Odg: Odg: Restart gateway-receiver

2019-11-26 Thread Mario Kevo
Thanks a lot @Barry Oglesby<mailto:bogle...@pivotal.io>!

It seems that this closing inactive connection is done by Kubernetes as we run 
Geode on it.

BR,
Mario


Šalje: Barry Oglesby 
Poslano: 22. studenog 2019. 22:35
Prima: Mario Kevo 
Kopija: dev@geode.apache.org 
Predmet: Re: Odg: Restart gateway-receiver

> If we don't send any event, the connection will be closed after some time as 
> connection is inactive.

Are you seeing this behavior? I don't think this is true by default.

AbstractRemoteGatewaySender.initProxy sets these fields on the PoolFactory:

pf.setReadTimeout(this.socketReadTimeout);
pf.setIdleTimeout(connectionIdleTimeOut);

By default, socketReadTimeout is 0 (no timeout), and connectionIdleTimeOut is 
-1 (disabled).

Each Event Processor thread will have its own connection to the remote site:

Event Processor for GatewaySender_ny.1: 
GatewaySenderEventRemoteDispatcher.initializeConnection connection=Pooled 
Connection to 127.0.0.1:5452<http://127.0.0.1:5452>: 
Connection[127.0.0.1:5452]@306907760
Event Processor for GatewaySender_ny.2: 
GatewaySenderEventRemoteDispatcher.initializeConnection connection=Pooled 
Connection to 127.0.0.1:5452<http://127.0.0.1:5452>: 
Connection[127.0.0.1:5452]@608855137
Event Processor for GatewaySender_ny.0: 
GatewaySenderEventRemoteDispatcher.initializeConnection connection=Pooled 
Connection to 127.0.0.1:5452<http://127.0.0.1:5452>: 
Connection[127.0.0.1:5452]@950613560
Event Processor for GatewaySender_ny.4: 
GatewaySenderEventRemoteDispatcher.initializeConnection connection=Pooled 
Connection to 127.0.0.1:5452<http://127.0.0.1:5452>: 
Connection[127.0.0.1:5452]@1005378489
Event Processor for GatewaySender_ny.3: 
GatewaySenderEventRemoteDispatcher.initializeConnection connection=Pooled 
Connection to 127.0.0.1:5452<http://127.0.0.1:5452>: 
Connection[127.0.0.1:5452]@629246640

There will be one Event Processor thread for each dispatcher thread (5 by 
default).

There aren't any good public ways to monitor the connections other than JMX.

One way to monitor these connections is with ConnectionStats on the sender side.

You can do that with vsd: ClientStats -> GatewaySenderStats -> connections

You can also do it with code like:

private int getConnectionCount(String gatewaySenderId) {
  AbstractGatewaySender sender = (AbstractGatewaySender) 
cache.getGatewaySender(gatewaySenderId);
  int totalConnections = 0;
  if (sender != null) {
for (ConnectionStats connectionStats : 
sender.getProxy().getEndpointManager().getAllStats().values()) {
  totalConnections += connectionStats.getConnections();
}
System.out.println("Sender=" + gatewaySenderId + "; connectionCount=" + 
totalConnections);
  }
  return totalConnections;
}

You can also dump whether the dispatcher is connected like:

private void dumpConnected(String gatewaySenderId) {
  AbstractGatewaySender sender = (AbstractGatewaySender) 
cache.getGatewaySender(gatewaySenderId);
  if (sender.isParallel()) {
ConcurrentParallelGatewaySenderEventProcessor concurrentProcessor = 
(ConcurrentParallelGatewaySenderEventProcessor) sender.getEventProcessor();
for (ParallelGatewaySenderEventProcessor processor : 
concurrentProcessor.getProcessors()) {
  System.out.println("Processor=" + processor + "; isConnected=" + 
processor.getDispatcher().isConnectedToRemote());
}
  } else {
ConcurrentSerialGatewaySenderEventProcessor concurrentProcessor = 
(ConcurrentSerialGatewaySenderEventProcessor) sender.getEventProcessor();
List processors = 
concurrentProcessor.getProcessors();
for (SerialGatewaySenderEventProcessor processor : 
concurrentProcessor.getProcessors()) {
  System.out.println("Processor=" + processor + "; isConnected=" + 
processor.getDispatcher().isConnectedToRemote());
}
  }
}

The isConnectedToRemote method does:

return connection != null && !connection.isDestroyed();

Thanks,
Barry Oglesby



On Thu, Nov 21, 2019 at 11:15 PM Mario Kevo  wrote:
Hi,

@Barry Oglesby<mailto:bogle...@pivotal.io>, thanks for the clarification.

If we don't send any event, the connection will be closed after some time as 
connection is inactive.
Is it possible to somehow monitor from the app if the replication is 
established to get information if there is a some problem with network or it is 
just closed due to inactivity?
Can we monitor the replication on some other way than looking "isConnected" 
state on JMX?

BR,
Mario

Šalje: Barry Oglesby mailto:bogle...@pivotal.io>>
Poslano: 14. studenog 2019. 18:29
Prima: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Predmet: Re: Restart gateway-receiver

Mario,

Thats the current behavior. When the sender is initially started, it
attempts to connect to the receiver. If it does no

Odg: Restart gateway-receiver

2019-11-21 Thread Mario Kevo
Hi,

@Barry Oglesby<mailto:bogle...@pivotal.io>, thanks for the clarification.

If we don't send any event, the connection will be closed after some time as 
connection is inactive.
Is it possible to somehow monitor from the app if the replication is 
established to get information if there is a some problem with network or it is 
just closed due to inactivity?
Can we monitor the replication on some other way than looking "isConnected" 
state on JMX?

BR,
Mario

Šalje: Barry Oglesby 
Poslano: 14. studenog 2019. 18:29
Prima: dev@geode.apache.org 
Predmet: Re: Restart gateway-receiver

Mario,

Thats the current behavior. When the sender is initially started, it
attempts to connect to the receiver. If it does not connect, it won't retry
until there is a batch of events to send. Look for callers of
GatewaySenderEventRemoteDispatcher.initializeConnection to see the
behavior. It could be changed to have a task to retry lost connections, but
generally there are events in the queue, so the connection is
re-established pretty quickly by the event processor thread.

Thanks,
Barry Oglesby



On Wed, Nov 13, 2019 at 4:55 AM Mario Kevo  wrote:

> Hi geode dev,
>
> After creating gateways senders and receivers between two geode clusters
> replications is established. After restart gateway receiver, sender will
> not connect to it until we send some entry from sender to receiver.
> Is this a normal behavior or a bug?
> Should geode have some mechanism for checking if connection is established
> no matter if entry is sent or not?
>
> BR,
> Mario
>


Restart gateway-receiver

2019-11-13 Thread Mario Kevo
Hi geode dev,

After creating gateways senders and receivers between two geode clusters 
replications is established. After restart gateway receiver, sender will not 
connect to it until we send some entry from sender to receiver.
Is this a normal behavior or a bug?
Should geode have some mechanism for checking if connection is established no 
matter if entry is sent or not?

BR,
Mario


Re: Lucene upgrade

2019-11-06 Thread Mario Kevo
Hi Dan,

thanks for suggestions.
I didn't found a way to write lucene in older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec.

Regarding to freeze writes to the lucene index, that means that we need
to start locators and servers, create lucene index on the server, roll
it to current and then do puts. In this case tests passed. Is it ok?


BR,
Mario


On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> I think the issue probably has to do with doing a rolling upgrade
> from an
> old version of geode (with an old version of lucene) to the new
> version of
> geode.
> 
> Geode's lucene integration works by writing the lucene index to a
> colocated
> region. So lucene index data that was generated on one server can be
> replicated or rebalanced to other servers.
> 
> I think what may be happening is that data written by a geode member
> with a
> newer version is being read by a geode member with an old version.
> Because
> this is a rolling upgrade test, members with multiple versions will
> be
> running as part of the same cluster.
> 
> I think to really fix this rolling upgrade issue we would need to
> somehow
> configure the new version of lucene to write data in the old format,
> at
> least until the rolling upgrade is complete. I'm not sure if that is
> possible with lucene or not - but perhaps? Another option might be to
> freeze writes to the lucene index during the rolling upgrade process.
> Lucene indexes are asynchronous, so this wouldn't necessarily require
> blocking all puts. But it would require queueing up a lot of updates.
> 
> -Dan
> 
> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
> wrote:
> 
> > Hi geode dev,
> > 
> > I'm working on upgrade lucene to a newer version. (
> > https://issues.apache.org/jira/browse/GEODE-7309)
> > 
> > I followed instruction from
> > 
https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > Also add some other changes that is needed for lucene 8.2.0.
> > 
> > I found some problems with tests:
> >  * geode-
> >lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > ribu
> >ted/DistributedScoringJUnitTest.java:
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > ava:
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > ed.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > n.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > itionRegion.java:
> > 
> >   -> failed due to
> > Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > Format
> > version is not supported (resource
> > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > and
> > 9). This version of Lucene only supports indexes created with
> > release
> > 6.0 and later.
> > at
> > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > a:21
> > 3)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > 05)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > 89)
> > at
> > org.apache.lucene.index.IndexWriter.(IndexWriter.java:846)
> > at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > hCom
> > putingRepository(IndexRepositoryFactory.java:123)
> > at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > teIn
> > dexRepository(IndexRepositoryFactory.java:66)
> > at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .com
> > puteRepository(PartitionedRepositoryManager.java:151)
> > at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .lam
> > bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> > ... 16 more
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > lBucketsCreated.java:
> > 
> >   -> failed with the same exception as previous tests
> > 
> > 
> > I found this on web
> > 
> > 
https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > , but not have an idea how to proceed with that.
> > 
> > Does anyone has any idea how to fix it?
> > 
> > BR,
> > Mario
> > 
> > 
> > 
> > 
> > 


Lucene upgrade

2019-11-04 Thread Mario Kevo
Hi geode dev,

I'm working on upgrade lucene to a newer version. (
https://issues.apache.org/jira/browse/GEODE-7309)

I followed instruction from 
https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
Also add some other changes that is needed for lucene 8.2.0.

I found some problems with tests:
 * geode-
   lucene/src/test/java/org/apache/geode/cache/lucene/internal/distribu
   ted/DistributedScoringJUnitTest.java:  


 * 
geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.java:
 * 
geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled.java:
 * 
./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java:
 * 
./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion.java:

  -> failed due to 
Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.
at
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:21
3)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:305)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
at
org.apache.lucene.index.IndexWriter.(IndexWriter.java:846)
at
org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishCom
putingRepository(IndexRepositoryFactory.java:123)
at
org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIn
dexRepository(IndexRepositoryFactory.java:66)
at
org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.com
puteRepository(PartitionedRepositoryManager.java:151)
at
org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.lam
bda$computeRepository$1(PartitionedRepositoryManager.java:170)
... 16 more


 * 
geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:
 
  -> failed with the same exception as previous tests


I found this on web 
https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
, but not have an idea how to proceed with that.

Does anyone has any idea how to fix it?

BR,
Mario






Re: ssl configuration parameters

2019-09-27 Thread Mario Kevo
A correction is needed here, this seems to actually work. The catch is
that if a JmxOperationInvoker is created from a client with a “ssl-
enabled-components” scope broader than the one defined on the locators
and servers, it seems to override it “cluster” scope. Is this behavior
expected?
 

On Thu, 2019-09-26 at 19:21 +, Mario Kevo wrote:
> Hi geode dev,
>  
> We would need to clarify the meaning of some ssl configuration
> parameters. When the flag “ssl-enabled-components” is set to
> “cluster”,
> our understanding is that this means geode would enforce SSL only
> between members of the same distributedSystem (same site). This would
> imply that communication between sites (gateway communication and
> site2site locator communication) wouldn’t be encrypted with ssl? Is
> this understanding correct?
>  
> If so, the behavior seems to differ: locator2locator communication
> between 2 sites/distributed systems fails if their certificates
> aren’t
> properly configured, meaning that ssl is still enforced in that
> communication.
> 
> Thanks,
> Mario


ssl configuration parameters

2019-09-26 Thread Mario Kevo
Hi geode dev,
 
We would need to clarify the meaning of some ssl configuration
parameters. When the flag “ssl-enabled-components” is set to “cluster”,
our understanding is that this means geode would enforce SSL only
between members of the same distributedSystem (same site). This would
imply that communication between sites (gateway communication and
site2site locator communication) wouldn’t be encrypted with ssl? Is
this understanding correct?
 
If so, the behavior seems to differ: locator2locator communication
between 2 sites/distributed systems fails if their certificates aren’t
properly configured, meaning that ssl is still enforced in that
communication.

Thanks,
Mario


Not mixing cache.xml and cluster configuration

2019-08-21 Thread Mario Kevo
Hi Geode dev,

I've opened ticket about mixing cache.xml and cluster configuration 
https://issues.apache.org/jira/browse/GEODE-7025

According to some comments on tickets(
https://issues.apache.org/jira/browse/GEODE-6772?focusedCommentId=16869670=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16869670
) and testing I think that we need to document this behaviors.

I add two proposals to the ticket, so we can proceed with one of them
if you agree, or if you have any other idea to resolve this problem.

Thanks and BR,
Mario


PR reviews

2019-07-29 Thread Mario Kevo
Hi Geode dev,

We need some PR reviewers on the following PRs. Some of these just need
to be *re*-reviewed.

GEODE-6998 NPE during update of index due to GII
https://github.com/apache/geode/pull/3834

GEODE-6954 GatewaySenderMXBean wrongly reports state
https://github.com/apache/geode/pull/3826

GEODE-6717 NotAuthorizedException during JMX scraping

https://github.com/apache/geode/pull/3697


Thanks in advance,
Mario


Hostname validation

2019-07-22 Thread Mario Kevo
Hi,
 
When SSL is enabled and ssl-endpoint-identification-enabled flag is set
to true, hostname validation is performed while establishing a
connection. This includes checking the hostname and IP address in the
certificate. In past releases, if hostname validation was disabled, a
warning log message would pop up saying hostname validation will become
mandatory in future Geode releases. This message has been removed in
recent releases, but we would still like to check whether there is a
plan to mandate hostname validation. The reasons for asking are the
implementation problems in cloud native applications with hostname and
IP validation. The IP address can change after each restart, and it
would be extremely cumbersome maintaining that in the certificates. And
in general, sticking to specific IP addresses doesn’t go in line with
cloud native principles.


Apache Geode 1.10 release

2019-06-19 Thread Mario Kevo
Hi all,

I saw that on Wiki pages is not updated when should be next Geode
release? Does anyone know when it is planned?

Thanks in advance,
Mario


Request access to Jira

2019-03-29 Thread Mario Kevo
Hi all,

Can you give me access to Jira so I can assigne on tickets?
My Jira Username is 'mkevo'.

Thanks and BR,
Mario