Re: New geode-gfsh module

2019-12-06 Thread Owen Nichols
Any standalone management API or client thereof would not be able to start 
locator or start server.  For that gfsh still needs a large chunk of Geode.  


> On Dec 6, 2019, at 12:25 PM, Udo Kohlmeyer  wrote:
> 
> I imagine once the Management v2 API's are GA (and feature complete), I don't 
> see a reason why /gfsh/ should not be a stand alone module. It would 
> definitely have to be updated to use the new v2 API's, which should not have 
> any direct dependency on geode-core any more.
> 
> On 12/6/19 10:01 AM, Jacob Barrett wrote:
>> 
>>> On Dec 6, 2019, at 9:44 AM, Jens Deppe  wrote:
>>> 
>>> Just to be clear, this effort does *not* result in a standalone gfsh
>>> executable/jar.
>> Is this a future plan?
>> 
>> 



Re: New geode-gfsh module

2019-12-06 Thread Udo Kohlmeyer
I imagine once the Management v2 API's are GA (and feature complete), I 
don't see a reason why /gfsh/ should not be a stand alone module. It 
would definitely have to be updated to use the new v2 API's, which 
should not have any direct dependency on geode-core any more.


On 12/6/19 10:01 AM, Jacob Barrett wrote:



On Dec 6, 2019, at 9:44 AM, Jens Deppe  wrote:

Just to be clear, this effort does *not* result in a standalone gfsh
executable/jar.

Is this a future plan?




Re: Odg: Odg: Lucene upgrade

2019-12-06 Thread Jason Huynh
Hi Mario,

I made a PR against your branch for some of the changes I had to do to get
past the Index too new exception.  Summary - repo creation, even if no
writes occur, appear to create some meta data that the old node attempts to
read and blow up on.

The pr against your branch just prevents the repo from being constructed
until all old members are upgraded.
This requires test changes to not try to validate using queries (since we
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because
we kind of intended for that with the oldMember check.  In-between the
server rolls, the test was trying to verify, but because not all servers
had upgraded, the LuceneEventListener wasn't allowing the queue to drain on
the new member.

I am not sure if the changes I added are acceptable or not -maybe if this
ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:

> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
> between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 
> 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
> expectedRegionSize, 5,15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 
> entries). The problem is while executing verifyLuceneQueryResults, for 
> VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET  GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET  GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
> dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode 
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > 
> > Šalje: Xiaojian Zhou 
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode 
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read 

Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Jacob Barrett


> On Dec 6, 2019, at 9:40 AM, Dan Smith  wrote:
> 
> Regarding changing PoolManager to
> an interface, I guess originally I wasn't thinking we would still be
> backwards compatible if we did that. But as I think about it I think that
> might be ok. One slight issue with that approach is that we have to come up
> with new names for the methods - we can't have both an instance and a
> static method with the same name and args. Maybe still worth it

Doh! I didn’t think about that. It sort of defeats the purpose of reusing the 
class. So going with a whole new class probably makes more sense to remove 
confusion.

-Jake



Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Dale Emery

> Dale - are you suggesting a ConnectionPoolService that returns ConnectionPool 
> instances?

Yes.

> Would that mean ConnectionPool would extend Pool and we would deprecate Pool 
> itself?

Maybe extend. I worry about extending, for two reasons.

First, extending would make the new interface depend on the deprecated one. 
That feels awkward for reasons I can’t articulate

Second, extending would mean that the new interface gets all the methods of the 
deprecated one whether we want them or not. I don’t know enough about Pool to 
have an opinion about whether we want to carry all of its method signatures 
forward.

An alternative to consider: Each ConnectionPool implementation delegates to a 
Pool. I suspect that this would make it harder to migrate existing uses from 
Pool to ConnectionPool.

—
Dale Emery
dem...@pivotal.io



Re: New geode-gfsh module

2019-12-06 Thread Patrick Johnson
Our goal wasn’t to make gfsh standalone, but a couple people have asked about 
it already. It’s not currently planned as far as I know, though maybe it will 
be in the future.

> On Dec 6, 2019, at 10:00 AM, Jacob Barrett  wrote:
> 
> 
> 
>> On Dec 6, 2019, at 9:43 AM, Jens Deppe  wrote:
>> 
>> The geode-dependencies.jar now includes the *geode-gfsh.jar* (as well as
>> Spring still).
> 
> Should it??



Re: New geode-gfsh module

2019-12-06 Thread Jacob Barrett



> On Dec 6, 2019, at 9:44 AM, Jens Deppe  wrote:
> 
> Just to be clear, this effort does *not* result in a standalone gfsh
> executable/jar.

Is this a future plan?




Re: New geode-gfsh module

2019-12-06 Thread Jacob Barrett



> On Dec 6, 2019, at 9:43 AM, Jens Deppe  wrote:
> 
> The geode-dependencies.jar now includes the *geode-gfsh.jar* (as well as
> Spring still).

Should it??

Re: WAN replication issue in cloud native environments

2019-12-06 Thread Anilkumar Gingade
Alberto,

Can you please file a JIRA ticket for this. This could come up often as
more and more deployments move to K8s.

-Anil.


On Fri, Dec 6, 2019 at 8:33 AM Sai Boorlagadda 
wrote:

> > if one gw receiver stops, the locator will publish to any remote locator
> that there are no receivers up.
>
> I am not sure if locators proactively update remote locators about change
> in receivers list rather I think the senders figures this out on connection
> issues.
> But I see the problem that local-site locators have only one member in the
> list of receivers that they maintain as all receivers register with a
> single  address.
>
> One idea I had earlier is to statically set receivers list to locators
> (just like remote-locators property) which are exchanged with gw-senders.
> This way we can introduce a boolean flag to turn off wan discovery and use
> the statically configured addresses. This can be also useful for
> remote-locators if they are behind a service.
>
> Sai
>
> On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
>  wrote:
>
> > Thanks Charlie, but the issue is not about connectivity. Summarizing the
> > issue, the problem is that if you have two or more gw receivers that are
> > started with the same value of "hostname-for-senders", "start-port" and
> > "end-port" (being "start-port" and "end-port" equal) parameters, if one
> gw
> > receiver stops, the locator will publish to any remote locator that there
> > are no receivers up.
> >
> > And this use case is likely to happen on cloud-native environments, as
> > described.
> >
> > BR/
> >
> > Alberto B.
> > 
> > De: Charlie Black 
> > Enviado: miércoles, 4 de diciembre de 2019 18:11
> > Para: dev@geode.apache.org 
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> > Alberto,
> >
> > Something else to think about SNI based routing.   I believe Mario might
> be
> > working on adding SNI to Geode - he at least had a proposal that he
> > e-mailed out.
> >
> > Basics are the destination host is in the SNI field and the proxy can
> > inspect and route the request to the right service instance. Plus we
> > have the option to not terminate the SSL at the proxy.
> >
> > Full disclosure - I haven't tried out SNI based routing myself and it is
> > something that I thought could work as I was reading about it.   From the
> > whiteboard I have done I think this will do ingress and egress just fine.
> > Potentially easier then port mapping and `hostname for clients` playing
> > around.
> >
> > Just something to think about.
> >
> > Charlie
> >
> >
> > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
> >  wrote:
> >
> > > Hi Jacob,
> > >
> > > Yes,we are using LoadBalancer service type. But note the problem is not
> > > the transport layer but on Geode as GW senders are complaining
> > > “sender-2-parallel : Could not connect due to: There are no active
> > > servers.” when one of the servers in the receiving cluster is killed.
> > >
> > > So, there is still one server alive in the receiving cluster but GW
> > sender
> > > does not know it and the locator is not able to inform about its
> > existence.
> > > Looking at the code it seems internal data structures (maps) holding
> the
> > > profiles use object whose equality check relies only on hostname and
> > port.
> > > This makes it impossible to differentiate servers when the same
> > > “hostname-for-senders” and port are used. When the killed server comes
> > back
> > > up, the locator profiles are updated (internal map back to size()=1
> > > although 2+ servers are there) and GW senders happily reconnect.
> > >
> > > The solution with the Geode as-is would be to expose each GW receiver
> on
> > a
> > > different port outside of k8s cluster, this includes creating N
> > Kubernetes
> > > services for N GW receivers in addition to updating the service mesh
> > > configuration (if it is used, firewalls etc…). Declarative nature of
> > > kubernetes means we must know the ports in advance hence start-port and
> > > end-port when creating each GW receiver must be equal and we should
> have
> > > some well-known
> > > algorithm when creating GW receivers across servers. For example:
> > server-0
> > > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> > > receivers must be wired individually and we must turn off Geode’s
> random
> > > port allocation.
> > >
> > > But we are exploring the possibility for Geode to handle this
> > cloud-native
> > > configuration a bit better. Locators should be capable of holding GW
> > > receiver information although they are hidden behind same hostname and
> > port.
> > > This is a code change in Geode and we would like to have community
> > opinion
> > > on it.
> > >
> > > Some obvious impacts with the legacy behavior would be when locator
> picks
> > > a server on behalf of the client (GW sender in this case) it does so
> > based
> > >  on the server load. When sender connects and considering 

Re: New geode-gfsh module

2019-12-06 Thread Jens Deppe
Just to be clear, this effort does *not* result in a standalone gfsh
executable/jar.

Sorry.

--Jens

On Fri, Dec 6, 2019 at 6:27 AM Jens Deppe  wrote:

> We have completed the work to move the gfsh code into a separate gradle
> submodule. This work has the following implications and effects:
>
>- geode-core does not have any direct dependencies on Spring libraries
>anymore
>- Anyone building with Geode will need to include the '*geode-gfsh'*
>dependency in order to use gfsh commands. This is relevant to anyone using
>Spring Boot (or Spring Data Geode) to launch locators or servers.
>- The Geode distribution (*.zip/.tgz) still includes all necessary
>libs required to use gfsh (i.e. Spring libraries)
>- There is no change for users using the 'gfsh' utility directly to
>launch locators and servers since the *gfsh-dependencies.jar* still
>contains all necessary references to Spring, etc.
>
> Please let us know if you discover any issues related to this work.
>
> Thanks
> -- Jens & Patrick
>


Re: New geode-gfsh module

2019-12-06 Thread Jens Deppe
The geode-dependencies.jar now includes the *geode-gfsh.jar* (as well as
Spring still).

--Jens

On Fri, Dec 6, 2019 at 8:49 AM Anthony Baker  wrote:

> Did the class path in geode-dependencies.jar change?  If so, that might
> also affect applications that relied on the those (spring) jars being
> available on the class path.  Of course, they can fix that by explicitly
> injecting the applications dependencies into the class path as needed.
>
> Anthony
>
>
> > On Dec 6, 2019, at 6:27 AM, Jens Deppe  wrote:
> >
> > We have completed the work to move the gfsh code into a separate gradle
> > submodule. This work has the following implications and effects:
> >
> >   - geode-core does not have any direct dependencies on Spring libraries
> >   anymore
> >   - Anyone building with Geode will need to include the '*geode-gfsh'*
> >   dependency in order to use gfsh commands. This is relevant to anyone
> using
> >   Spring Boot (or Spring Data Geode) to launch locators or servers.
> >   - The Geode distribution (*.zip/.tgz) still includes all necessary libs
> >   required to use gfsh (i.e. Spring libraries)
> >   - There is no change for users using the 'gfsh' utility directly to
> >   launch locators and servers since the *gfsh-dependencies.jar* still
> >   contains all necessary references to Spring, etc.
> >
> > Please let us know if you discover any issues related to this work.
> >
> > Thanks
> > -- Jens & Patrick
>
>


Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Dan Smith
Jake wrote:

> On thought though, did you consider just converting the existing
> PoolManager to an interface leaving the static methods intact but
> deprecated?
>

Dale wrote:

> To the extent possible without breaking existing APIs, please name the new
> stuff to indicate what’s in the pool (E.g. ConnectionPool,
> ConnectionPoolService, and so on).



Dang, I like both of these suggestions!  Regarding changing PoolManager to
an interface, I guess originally I wasn't thinking we would still be
backwards compatible if we did that. But as I think about it I think that
might be ok. One slight issue with that approach is that we have to come up
with new names for the methods - we can't have both an instance and a
static method with the same name and args. Maybe still worth it

Dale - are you suggesting a ConnectionPoolService that returns
ConnectionPool instances? Would that mean ConnectionPool would extend Pool
and we would deprecate Pool itself?

-Dan

On Fri, Dec 6, 2019 at 8:32 AM Darrel Schneider 
wrote:

> +1
>
> On Thu, Dec 5, 2019 at 4:40 PM Dan Smith  wrote:
>
> > Hi,
> >
> > I wrote up a proposal for deprecating of the singleton PoolManager in
> favor
> > of a ClientCache scoped service. Please review and comment on the below
> > proposal.
> >
> > I think this should address the issues that Spring Data Geode and friends
> > had trying to mock Pools and remove the need for those projects to try to
> > inject mock Pools into a Geode singleton.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/GEODE/Replace+singleton+PoolManager+with+ClientCache+scoped+service
> >
> > Thanks,
> > -Dan
> >
>


Re: Odg: Certificate Based Authorization

2019-12-06 Thread Jens Deppe
Thanks for the write-up. I think it does require a bit of clarification
around how the functionality is enabled.

You've stated:

For client connections, we could presume that certificate based
> authorization should be used if both features are enabled, but the client
> cache properties don’t provide credentials
> (security-username/security-password).


Currently, the presence of any '*auth-init' parameters, does not
necessarily require setting *security-username/password* (although almost
all implementations of AuthInitialize probably do use them). So this
condition will not be sufficient to enable this new functionality.

Although we already have so many parameters I think that having an explicit
parameter, to enable this feature, will avoid any possible confusion.

I'm wondering whether, for an initial deliverable, we should require
*ssl-enabled-components=all*. This would not allow a mix of different forms
of authentication for different endpoints. Perhaps this might simplify the
implementation but would not preclude us from adding that capability in the
future.

--Jens

On Fri, Dec 6, 2019 at 1:13 AM Mario Kevo  wrote:

> Hi all,
>
> I wrote up a proposal for Certificate Based Authorization.
> Please review and comment on the below proposal.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/Certificate+Based+Authorization
>
> BR,
> Mario
> 
> Šalje: Udo Kohlmeyer 
> Poslano: 2. prosinca 2019. 20:10
> Prima: dev@geode.apache.org 
> Predmet: Re: Certificate Based Authorization
>
> +1
>
> On 12/2/19 1:29 AM, Mario Kevo wrote:
> > Hi,
> >
> >
> >
> > There is another potential functionality we would like to discuss and
> get some comments for. The idea is TLS certificate based authorization.
> Currently, if a user wants secure communication (TLS) + authorization, he
> needs to enable TLS and access control. The user also needs to handle both
> the certificates for TLS and the credentials for access control. The idea
> we have is to use both features: TLS and access control, but remove the
> need to handle the credentials (generating and securely storing the
> username and password). Instead of the credentials, the certificate subject
> DN would be used for authorization.
> >
> >
> >
> > This would of course be optional. We would leave the possibility to use
> these 2 features as they are right now, but would also provide a
> configuration option to use the features without the need for client
> credentials, utilizing the certificate information instead.
> >
> >
> >
> > For further clarity, here are the descriptions of how the options would
> work:
> >
> >
> >
> >1.  Using TLS and access control as they work right now
> >   *   Certificates are prepared for TLS
> >   *   A SecurityManager is prepared for access control
> authentication/authorization. As part of this, a file (e.g. security.json)
> is prepared where we define the allowed usernames, passwords and
> authorization rights for each username
> >   *   The credentials are distributed towards clients. Here a user
> needs to consider secure distribution and periodical rotation of
> credentials.
> >
> > Once a client initiates a connection, we first get the TLS layer and
> certificate check, and right after that we perform the
> authentication/authorization of the user credentials.
> >
> >
> >
> >1.  TLS certificate based authorization
> >   *   Certificates are prepared for TLS
> >   *   A SecurityManager is prepared for access control
> authentication/authorization. As part of this, a file (e.g. security.json)
> is prepared. In this case we don’t define the authorization rights based on
> usernames/passwords but based on certificate subject DNs.
> >   *   There is no more need to distribute or periodically rotate the
> credentials, since there would be none. Authorization would be based  on
> the subject DN fetched from the certificate used for that same connection
> >
> > Once a client initiates a connection, and when we get past the TLS
> layer, at the moment where geode expects the credentials from the client
> connection, we just take the certificate subject DN instead and provide it
> to the security manager for authorization.
> >
> >
> >
> > This wouldn’t lower the level of security (we can have TLS enabled
> without access control already), but would provide authentication without
> the hassle of username and password handling.
> >
> >
> >
> > This is the basic description of the idea. There would be more things to
> consider, like multi user authentication, but for now we would just like to
> get some initial feedback. If it is considered useful, we could get into
> the details.
> >
> >
> > BR,
> >
> > Mario
> >
> >
>


Re: New geode-gfsh module

2019-12-06 Thread Anthony Baker
Did the class path in geode-dependencies.jar change?  If so, that might also 
affect applications that relied on the those (spring) jars being available on 
the class path.  Of course, they can fix that by explicitly injecting the 
applications dependencies into the class path as needed.

Anthony


> On Dec 6, 2019, at 6:27 AM, Jens Deppe  wrote:
> 
> We have completed the work to move the gfsh code into a separate gradle
> submodule. This work has the following implications and effects:
> 
>   - geode-core does not have any direct dependencies on Spring libraries
>   anymore
>   - Anyone building with Geode will need to include the '*geode-gfsh'*
>   dependency in order to use gfsh commands. This is relevant to anyone using
>   Spring Boot (or Spring Data Geode) to launch locators or servers.
>   - The Geode distribution (*.zip/.tgz) still includes all necessary libs
>   required to use gfsh (i.e. Spring libraries)
>   - There is no change for users using the 'gfsh' utility directly to
>   launch locators and servers since the *gfsh-dependencies.jar* still
>   contains all necessary references to Spring, etc.
> 
> Please let us know if you discover any issues related to this work.
> 
> Thanks
> -- Jens & Patrick



Re: WAN replication issue in cloud native environments

2019-12-06 Thread Sai Boorlagadda
> if one gw receiver stops, the locator will publish to any remote locator
that there are no receivers up.

I am not sure if locators proactively update remote locators about change
in receivers list rather I think the senders figures this out on connection
issues.
But I see the problem that local-site locators have only one member in the
list of receivers that they maintain as all receivers register with a
single  address.

One idea I had earlier is to statically set receivers list to locators
(just like remote-locators property) which are exchanged with gw-senders.
This way we can introduce a boolean flag to turn off wan discovery and use
the statically configured addresses. This can be also useful for
remote-locators if they are behind a service.

Sai

On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
 wrote:

> Thanks Charlie, but the issue is not about connectivity. Summarizing the
> issue, the problem is that if you have two or more gw receivers that are
> started with the same value of "hostname-for-senders", "start-port" and
> "end-port" (being "start-port" and "end-port" equal) parameters, if one gw
> receiver stops, the locator will publish to any remote locator that there
> are no receivers up.
>
> And this use case is likely to happen on cloud-native environments, as
> described.
>
> BR/
>
> Alberto B.
> 
> De: Charlie Black 
> Enviado: miércoles, 4 de diciembre de 2019 18:11
> Para: dev@geode.apache.org 
> Asunto: Re: WAN replication issue in cloud native environments
>
> Alberto,
>
> Something else to think about SNI based routing.   I believe Mario might be
> working on adding SNI to Geode - he at least had a proposal that he
> e-mailed out.
>
> Basics are the destination host is in the SNI field and the proxy can
> inspect and route the request to the right service instance. Plus we
> have the option to not terminate the SSL at the proxy.
>
> Full disclosure - I haven't tried out SNI based routing myself and it is
> something that I thought could work as I was reading about it.   From the
> whiteboard I have done I think this will do ingress and egress just fine.
> Potentially easier then port mapping and `hostname for clients` playing
> around.
>
> Just something to think about.
>
> Charlie
>
>
> On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi Jacob,
> >
> > Yes,we are using LoadBalancer service type. But note the problem is not
> > the transport layer but on Geode as GW senders are complaining
> > “sender-2-parallel : Could not connect due to: There are no active
> > servers.” when one of the servers in the receiving cluster is killed.
> >
> > So, there is still one server alive in the receiving cluster but GW
> sender
> > does not know it and the locator is not able to inform about its
> existence.
> > Looking at the code it seems internal data structures (maps) holding the
> > profiles use object whose equality check relies only on hostname and
> port.
> > This makes it impossible to differentiate servers when the same
> > “hostname-for-senders” and port are used. When the killed server comes
> back
> > up, the locator profiles are updated (internal map back to size()=1
> > although 2+ servers are there) and GW senders happily reconnect.
> >
> > The solution with the Geode as-is would be to expose each GW receiver on
> a
> > different port outside of k8s cluster, this includes creating N
> Kubernetes
> > services for N GW receivers in addition to updating the service mesh
> > configuration (if it is used, firewalls etc…). Declarative nature of
> > kubernetes means we must know the ports in advance hence start-port and
> > end-port when creating each GW receiver must be equal and we should have
> > some well-known
> > algorithm when creating GW receivers across servers. For example:
> server-0
> > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> > receivers must be wired individually and we must turn off Geode’s random
> > port allocation.
> >
> > But we are exploring the possibility for Geode to handle this
> cloud-native
> > configuration a bit better. Locators should be capable of holding GW
> > receiver information although they are hidden behind same hostname and
> port.
> > This is a code change in Geode and we would like to have community
> opinion
> > on it.
> >
> > Some obvious impacts with the legacy behavior would be when locator picks
> > a server on behalf of the client (GW sender in this case) it does so
> based
> >  on the server load. When sender connects and considering all servers are
> > using same VIP:PORT it is load balancer that will decide where the
> > connection will end up, but likely not on the one selected by locator. So
> > here we ignore the locator instructions. Since GW senders normally do not
> > create huge number of connections this probably shall not unbalance
> cluster
> > too much. But this is an impact worth considering. Custom load metrics
> > would also be 

Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Darrel Schneider
+1

On Thu, Dec 5, 2019 at 4:40 PM Dan Smith  wrote:

> Hi,
>
> I wrote up a proposal for deprecating of the singleton PoolManager in favor
> of a ClientCache scoped service. Please review and comment on the below
> proposal.
>
> I think this should address the issues that Spring Data Geode and friends
> had trying to mock Pools and remove the need for those projects to try to
> inject mock Pools into a Geode singleton.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/Replace+singleton+PoolManager+with+ClientCache+scoped+service
>
> Thanks,
> -Dan
>


Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Jacob Barrett
This is a great idea!

On thought though, did you consider just converting the existing PoolManager to 
an interface leaving the static methods intact but deprecated? I would think 
this would make existing code pretty easy to refactor over to the new code with 
minimal changes. If you thought about it can you add why you rejected it to the 
RFC.

-Jake


> On Dec 5, 2019, at 4:40 PM, Dan Smith  wrote:
> 
> Hi,
> 
> I wrote up a proposal for deprecating of the singleton PoolManager in favor
> of a ClientCache scoped service. Please review and comment on the below
> proposal.
> 
> I think this should address the issues that Spring Data Geode and friends
> had trying to mock Pools and remove the need for those projects to try to
> inject mock Pools into a Geode singleton.
> 
> https://cwiki.apache.org/confluence/display/GEODE/Replace+singleton+PoolManager+with+ClientCache+scoped+service
> 
> Thanks,
> -Dan


Re: [DISCUSS] Replacing singleton PoolManager

2019-12-06 Thread Joris Melchior
+1

On Thu, Dec 5, 2019 at 7:40 PM Dan Smith  wrote:

> Hi,
>
> I wrote up a proposal for deprecating of the singleton PoolManager in favor
> of a ClientCache scoped service. Please review and comment on the below
> proposal.
>
> I think this should address the issues that Spring Data Geode and friends
> had trying to mock Pools and remove the need for those projects to try to
> inject mock Pools into a Geode singleton.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/Replace+singleton+PoolManager+with+ClientCache+scoped+service
>
> Thanks,
> -Dan
>


-- 
*Joris Melchior *
CF Engineering
Pivotal Toronto
416 877 5427

“Programs must be written for people to read, and only incidentally for
machines to execute.” – *Hal Abelson*



Re: New geode-gfsh module

2019-12-06 Thread Jacob Barrett
 This  is  amazing  

> On Dec 6, 2019, at 6:27 AM, Jens Deppe  wrote:
> 
> We have completed the work to move the gfsh code into a separate gradle
> submodule. This work has the following implications and effects:
> 
>   - geode-core does not have any direct dependencies on Spring libraries
>   anymore
>   - Anyone building with Geode will need to include the '*geode-gfsh'*
>   dependency in order to use gfsh commands. This is relevant to anyone using
>   Spring Boot (or Spring Data Geode) to launch locators or servers.
>   - The Geode distribution (*.zip/.tgz) still includes all necessary libs
>   required to use gfsh (i.e. Spring libraries)
>   - There is no change for users using the 'gfsh' utility directly to
>   launch locators and servers since the *gfsh-dependencies.jar* still
>   contains all necessary references to Spring, etc.
> 
> Please let us know if you discover any issues related to this work.
> 
> Thanks
> -- Jens & Patrick


Re: New geode-gfsh module

2019-12-06 Thread Jens Deppe
Apologies to anyone who has any gfsh related PRs in flight as it will
require rebasing onto develop.

--Jens

On Fri, Dec 6, 2019 at 6:27 AM Jens Deppe  wrote:

> We have completed the work to move the gfsh code into a separate gradle
> submodule. This work has the following implications and effects:
>
>- geode-core does not have any direct dependencies on Spring libraries
>anymore
>- Anyone building with Geode will need to include the '*geode-gfsh'*
>dependency in order to use gfsh commands. This is relevant to anyone using
>Spring Boot (or Spring Data Geode) to launch locators or servers.
>- The Geode distribution (*.zip/.tgz) still includes all necessary
>libs required to use gfsh (i.e. Spring libraries)
>- There is no change for users using the 'gfsh' utility directly to
>launch locators and servers since the *gfsh-dependencies.jar* still
>contains all necessary references to Spring, etc.
>
> Please let us know if you discover any issues related to this work.
>
> Thanks
> -- Jens & Patrick
>


New geode-gfsh module

2019-12-06 Thread Jens Deppe
We have completed the work to move the gfsh code into a separate gradle
submodule. This work has the following implications and effects:

   - geode-core does not have any direct dependencies on Spring libraries
   anymore
   - Anyone building with Geode will need to include the '*geode-gfsh'*
   dependency in order to use gfsh commands. This is relevant to anyone using
   Spring Boot (or Spring Data Geode) to launch locators or servers.
   - The Geode distribution (*.zip/.tgz) still includes all necessary libs
   required to use gfsh (i.e. Spring libraries)
   - There is no change for users using the 'gfsh' utility directly to
   launch locators and servers since the *gfsh-dependencies.jar* still
   contains all necessary references to Spring, etc.

Please let us know if you discover any issues related to this work.

Thanks
-- Jens & Patrick


Odg: Certificate Based Authorization

2019-12-06 Thread Mario Kevo
Hi all,

I wrote up a proposal for Certificate Based Authorization.
Please review and comment on the below proposal.

https://cwiki.apache.org/confluence/display/GEODE/Certificate+Based+Authorization

BR,
Mario

Šalje: Udo Kohlmeyer 
Poslano: 2. prosinca 2019. 20:10
Prima: dev@geode.apache.org 
Predmet: Re: Certificate Based Authorization

+1

On 12/2/19 1:29 AM, Mario Kevo wrote:
> Hi,
>
>
>
> There is another potential functionality we would like to discuss and get 
> some comments for. The idea is TLS certificate based authorization. 
> Currently, if a user wants secure communication (TLS) + authorization, he 
> needs to enable TLS and access control. The user also needs to handle both 
> the certificates for TLS and the credentials for access control. The idea we 
> have is to use both features: TLS and access control, but remove the need to 
> handle the credentials (generating and securely storing the username and 
> password). Instead of the credentials, the certificate subject DN would be 
> used for authorization.
>
>
>
> This would of course be optional. We would leave the possibility to use these 
> 2 features as they are right now, but would also provide a configuration 
> option to use the features without the need for client credentials, utilizing 
> the certificate information instead.
>
>
>
> For further clarity, here are the descriptions of how the options would work:
>
>
>
>1.  Using TLS and access control as they work right now
>   *   Certificates are prepared for TLS
>   *   A SecurityManager is prepared for access control 
> authentication/authorization. As part of this, a file (e.g. security.json) is 
> prepared where we define the allowed usernames, passwords and authorization 
> rights for each username
>   *   The credentials are distributed towards clients. Here a user needs 
> to consider secure distribution and periodical rotation of credentials.
>
> Once a client initiates a connection, we first get the TLS layer and 
> certificate check, and right after that we perform the 
> authentication/authorization of the user credentials.
>
>
>
>1.  TLS certificate based authorization
>   *   Certificates are prepared for TLS
>   *   A SecurityManager is prepared for access control 
> authentication/authorization. As part of this, a file (e.g. security.json) is 
> prepared. In this case we don’t define the authorization rights based on 
> usernames/passwords but based on certificate subject DNs.
>   *   There is no more need to distribute or periodically rotate the 
> credentials, since there would be none. Authorization would be based  on the 
> subject DN fetched from the certificate used for that same connection
>
> Once a client initiates a connection, and when we get past the TLS layer, at 
> the moment where geode expects the credentials from the client connection, we 
> just take the certificate subject DN instead and provide it to the security 
> manager for authorization.
>
>
>
> This wouldn’t lower the level of security (we can have TLS enabled without 
> access control already), but would provide authentication without the hassle 
> of username and password handling.
>
>
>
> This is the basic description of the idea. There would be more things to 
> consider, like multi user authentication, but for now we would just like to 
> get some initial feedback. If it is considered useful, we could get into the 
> details.
>
>
> BR,
>
> Mario
>
>


Odg: Odg: Lucene upgrade

2019-12-06 Thread Mario Kevo
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario



Šalje: Jason Huynh 
Poslano: 2. prosinca 2019. 18:32
Prima: geode 
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> 
> Šalje: Xiaojian Zhou 
> Poslano: 7. studenog 2019. 18:27
> Prima: geode 
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou  wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >