Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Bruce Schuchardt
I don't like the idea of using someone else's messaging software for 
several reasons:


* we may have needs that are beyond its capabilities
* it may impose something like a broker or endpoint IDs that we have to 
deal with
* it introduces more overhead per message and we will have no control 
over that overhead
* it can introduce backward compatibility issues that we have no control 
over

* it won't fit into our configuration and management infrastructure

However, I do like the idea of exploring the use of other serialization 
software if it has good multi-language support.


Le 4/10/2017 à 1:38 PM, Michael William Dodge a écrit :

MQTT is an interesting idea but from the wiki page it sounds like it depends on 
a broker. In that architecture, would the server be the broker as well as a 
publisher and subscriber? Would the locator be the broker? Or would we 
introduce a separate broker, either third-party or bespoke?

Sarge


On 10 Apr, 2017, at 13:21, Michael Stolz  wrote:

I am wondering why we are leaning so heavily toward RPC IDL rather than
messaging standards.

One of the big features of the client-server discussion around Geode is the
ability to register interest and run Continuous Queries.
Both of these behave more like messaging than RPCs.

Beyond that all that Geode really does is puts and gets and function calls.
A put is analogous to a publish. A get is similar to a subscribe. A
function call can be implemented as a put on a special topic that has a
callback attached to it. In fact that's how we used to do server side
functions before we added the function execution api.

The other thing we are constantly battling with is being able to push more
and more data into Geode from clients faster and faster.
That too lends itself to a messaging protocol.

In fact, I remember that last year we were already having some discussions
about maybe developing a client based on MQTT.
That would make GemFire a natural for the Internet-of-Things and for mobile
devices.
I wonder if it would be sufficient for a full-blown .Net GemFire client.

One of our goals in this thread is to be able to have clients in many
languages.
Well, there are at least 75 different language bindings for MQTT already
out in the wild.

MQTT is agnostic about what format the payload is in, so we could support
PDX if we choose to, or ProtoBufs or FlatBuffers or whatever else for
payload serialization.

Thoughts?


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Mon, Apr 10, 2017 at 2:39 PM, Galen M O'Sullivan 
wrote:


On Mon, Apr 10, 2017 at 10:47 AM, Bruce Schuchardt 

Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Michael William Dodge
MQTT is an interesting idea but from the wiki page it sounds like it depends on 
a broker. In that architecture, would the server be the broker as well as a 
publisher and subscriber? Would the locator be the broker? Or would we 
introduce a separate broker, either third-party or bespoke?

Sarge

> On 10 Apr, 2017, at 13:21, Michael Stolz  wrote:
> 
> I am wondering why we are leaning so heavily toward RPC IDL rather than
> messaging standards.
> 
> One of the big features of the client-server discussion around Geode is the
> ability to register interest and run Continuous Queries.
> Both of these behave more like messaging than RPCs.
> 
> Beyond that all that Geode really does is puts and gets and function calls.
> A put is analogous to a publish. A get is similar to a subscribe. A
> function call can be implemented as a put on a special topic that has a
> callback attached to it. In fact that's how we used to do server side
> functions before we added the function execution api.
> 
> The other thing we are constantly battling with is being able to push more
> and more data into Geode from clients faster and faster.
> That too lends itself to a messaging protocol.
> 
> In fact, I remember that last year we were already having some discussions
> about maybe developing a client based on MQTT.
> That would make GemFire a natural for the Internet-of-Things and for mobile
> devices.
> I wonder if it would be sufficient for a full-blown .Net GemFire client.
> 
> One of our goals in this thread is to be able to have clients in many
> languages.
> Well, there are at least 75 different language bindings for MQTT already
> out in the wild.
> 
> MQTT is agnostic about what format the payload is in, so we could support
> PDX if we choose to, or ProtoBufs or FlatBuffers or whatever else for
> payload serialization.
> 
> Thoughts?
> 
> 
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771
> 
> On Mon, Apr 10, 2017 at 2:39 PM, Galen M O'Sullivan 
> wrote:
> 
>> On Mon, Apr 10, 2017 at 10:47 AM, Bruce Schuchardt >> 
>> wrote:
>> 
>>> I agree that key/value serialization is a separate issue and more related
>>> to storage than communications.  The thing I'm struggling with is how to
>>> specify message metadata in an RPC IDL.  I'm thinking of things like an
>>> eventID, transaction info, security principal, etc.  The basic
>>> client/server messaging doesn't need to know the details of this
>>> information - just that it exists and maybe the ID of each piece of
>>> metadata.
>>> 
>> 
>> Is there any reason that this data couldn't be packed into, say, Thrift
>> types? It's all numbers, right?
>> 



Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Michael Stolz
I am wondering why we are leaning so heavily toward RPC IDL rather than
messaging standards.

One of the big features of the client-server discussion around Geode is the
ability to register interest and run Continuous Queries.
Both of these behave more like messaging than RPCs.

Beyond that all that Geode really does is puts and gets and function calls.
A put is analogous to a publish. A get is similar to a subscribe. A
function call can be implemented as a put on a special topic that has a
callback attached to it. In fact that's how we used to do server side
functions before we added the function execution api.

The other thing we are constantly battling with is being able to push more
and more data into Geode from clients faster and faster.
That too lends itself to a messaging protocol.

In fact, I remember that last year we were already having some discussions
about maybe developing a client based on MQTT.
That would make GemFire a natural for the Internet-of-Things and for mobile
devices.
I wonder if it would be sufficient for a full-blown .Net GemFire client.

One of our goals in this thread is to be able to have clients in many
languages.
Well, there are at least 75 different language bindings for MQTT already
out in the wild.

MQTT is agnostic about what format the payload is in, so we could support
PDX if we choose to, or ProtoBufs or FlatBuffers or whatever else for
payload serialization.

Thoughts?


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Mon, Apr 10, 2017 at 2:39 PM, Galen M O'Sullivan 
wrote:

> On Mon, Apr 10, 2017 at 10:47 AM, Bruce Schuchardt  >
> wrote:
>
> > I agree that key/value serialization is a separate issue and more related
> > to storage than communications.  The thing I'm struggling with is how to
> > specify message metadata in an RPC IDL.  I'm thinking of things like an
> > eventID, transaction info, security principal, etc.  The basic
> > client/server messaging doesn't need to know the details of this
> > information - just that it exists and maybe the ID of each piece of
> > metadata.
> >
>
> Is there any reason that this data couldn't be packed into, say, Thrift
> types? It's all numbers, right?
>


Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Galen M O'Sullivan
On Mon, Apr 10, 2017 at 10:47 AM, Bruce Schuchardt 
wrote:

> I agree that key/value serialization is a separate issue and more related
> to storage than communications.  The thing I'm struggling with is how to
> specify message metadata in an RPC IDL.  I'm thinking of things like an
> eventID, transaction info, security principal, etc.  The basic
> client/server messaging doesn't need to know the details of this
> information - just that it exists and maybe the ID of each piece of
> metadata.
>

Is there any reason that this data couldn't be packed into, say, Thrift
types? It's all numbers, right?


Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Bruce Schuchardt
I agree that key/value serialization is a separate issue and more 
related to storage than communications.  The thing I'm struggling with 
is how to specify message metadata in an RPC IDL.  I'm thinking of 
things like an eventID, transaction info, security principal, etc.  The 
basic client/server messaging doesn't need to know the details of this 
information - just that it exists and maybe the ID of each piece of 
metadata.


Le 4/7/2017 à 5:23 PM, Jacob Barrett a écrit :

You are confusing message protocol with user data serialization. The two
are not related. Look at HTTP, the message protocol is pretty simple, PUT,
GET, etc., but it does nothing with the data being PUT/GET. On a GET the
message protocol has a field that specifies the Content-Type and
Content-Encoding and some other metadata. So the GET could get HTML, JPEG,
etc. but the protocol doesn't care and doesn't know anything special about
that type of the data it puts. The structure for JPEG is not defined in the
HTTP protocol at all.

So relate that to what we want to do. Our message protocol defines a PUT
and a GET operation, some metadata perhaps and a section for the data. It
should have no restriction or care on how that data was serialized. The
protocol does not define in any way the structure of the data being PUT or
GET.

Separating that concern then, does your argument still stand that PRC
frameworks do not work for the new Geode protocol?

On Fri, Apr 7, 2017 at 3:11 PM Galen M O'Sullivan 
wrote:


I think the main selling point of an RPC framework/IDL is ease-of-use for
defined remote communications that look like function calls. If you have
calls you're making to remote servers asking them to do work, you can
fairly trivially define the interface and then call through. You can then
use native types in function calls and they transparently get transformed
and sent across the wire.

The RPC protocols I've seen are based on the idea that the types that can
be sent will be predefined -- otherwise it's hard to describe with an IDL.

However, we want to support storing unstructured data, or at least data
structures that are defined (from the cluster's point of view) at runtime
-- one of the main selling points of Geode is PDX serialization, which lets
us store arbitrary object structures in the cache. If we were to use an RPC
framework we have all the commands accept byte arrays and include some
meta-information. This loses us the ease-of-use.

What's left in the protocol then is the calls and the number of arguments
they accept, and what order we put those (and the serialized arguments) in
on the wire. I don't think we gain much by using a preexisting RPC
language, and we lose control over the wire format and message structure.
If we want to be able to make the protocol really fast, and customized to
our use case; if we want to implement asynchronous requests, futures, etc.
then we have to write wrappers for a given language anyways, and packing
those things through an RPC framework like Thrift or gRPC will be an extra
layer of confusing complexity.

Best,
Galen





Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Jacob Barrett
I have used plenty of RPC protocols that pass raw binary as input/output
values because the types are not pre defined. All the frameworks mentioned
support byte arrays.

-Jake

On Mon, Apr 10, 2017 at 10:36 AM Galen M O'Sullivan 
wrote:

> Hi Jacob,
>
> The message protocol is conflated with user data serialization in pretty
> much all of these frameworks I've seen. If we define some RPC in Thrift, we
> have to specify the type of data that gets passed to the call. The type of
> that data is specified using the Thrift IDL, meaning it's a
> Thrift-serialized object. We could have all the remote procedure calls take
> and return byte arrays, but then I'm not sure what the benefit of using
> Thrift is.
>
> The nice thing about HTTP is that it's specified what it looks like on the
> wire and there's a defined mechanism for negotiating content type. The RPC
> frameworks I've seen aren't as clearly specified[1]. We would also have to
> write our own mechanism for negotiating encoding. It seems to me like we
> would be doing more work than we would be gaining.
>
> Another factor that will also constrain our serialization choices as well
> as RPC frameworks is that a lot of serialization libraries expect all the
> types to be known as runtime. We currently allow users to serialize POJOs
> or anything that will fit in PDX, and we want to keep that capability.
>
> [1]: I think I could reconstruct BERT based on the Erlang docs, but Thrift
> is harder. gRPC is basically Protobuf-encoded data over HTTP/2. We could
> put binary data in protobufs, but I'd rather not.
>
> On Fri, Apr 7, 2017 at 5:23 PM, Jacob Barrett  wrote:
>
> > You are confusing message protocol with user data serialization. The two
> > are not related. Look at HTTP, the message protocol is pretty simple,
> PUT,
> > GET, etc., but it does nothing with the data being PUT/GET. On a GET the
> > message protocol has a field that specifies the Content-Type and
> > Content-Encoding and some other metadata. So the GET could get HTML,
> JPEG,
> > etc. but the protocol doesn't care and doesn't know anything special
> about
> > that type of the data it puts. The structure for JPEG is not defined in
> the
> > HTTP protocol at all.
> >
> > So relate that to what we want to do. Our message protocol defines a PUT
> > and a GET operation, some metadata perhaps and a section for the data. It
> > should have no restriction or care on how that data was serialized. The
> > protocol does not define in any way the structure of the data being PUT
> or
> > GET.
> >
> > Separating that concern then, does your argument still stand that PRC
> > frameworks do not work for the new Geode protocol?
> >
> > On Fri, Apr 7, 2017 at 3:11 PM Galen M O'Sullivan  >
> > wrote:
> >
> > > I think the main selling point of an RPC framework/IDL is ease-of-use
> for
> > > defined remote communications that look like function calls. If you
> have
> > > calls you're making to remote servers asking them to do work, you can
> > > fairly trivially define the interface and then call through. You can
> then
> > > use native types in function calls and they transparently get
> transformed
> > > and sent across the wire.
> > >
> > > The RPC protocols I've seen are based on the idea that the types that
> can
> > > be sent will be predefined -- otherwise it's hard to describe with an
> > IDL.
> > >
> > > However, we want to support storing unstructured data, or at least data
> > > structures that are defined (from the cluster's point of view) at
> runtime
> > > -- one of the main selling points of Geode is PDX serialization, which
> > lets
> > > us store arbitrary object structures in the cache. If we were to use an
> > RPC
> > > framework we have all the commands accept byte arrays and include some
> > > meta-information. This loses us the ease-of-use.
> > >
> > > What's left in the protocol then is the calls and the number of
> arguments
> > > they accept, and what order we put those (and the serialized arguments)
> > in
> > > on the wire. I don't think we gain much by using a preexisting RPC
> > > language, and we lose control over the wire format and message
> structure.
> > > If we want to be able to make the protocol really fast, and customized
> to
> > > our use case; if we want to implement asynchronous requests, futures,
> > etc.
> > > then we have to write wrappers for a given language anyways, and
> packing
> > > those things through an RPC framework like Thrift or gRPC will be an
> > extra
> > > layer of confusing complexity.
> > >
> > > Best,
> > > Galen
> > >
> >
>


Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-10 Thread Galen M O'Sullivan
Hi Jacob,

The message protocol is conflated with user data serialization in pretty
much all of these frameworks I've seen. If we define some RPC in Thrift, we
have to specify the type of data that gets passed to the call. The type of
that data is specified using the Thrift IDL, meaning it's a
Thrift-serialized object. We could have all the remote procedure calls take
and return byte arrays, but then I'm not sure what the benefit of using
Thrift is.

The nice thing about HTTP is that it's specified what it looks like on the
wire and there's a defined mechanism for negotiating content type. The RPC
frameworks I've seen aren't as clearly specified[1]. We would also have to
write our own mechanism for negotiating encoding. It seems to me like we
would be doing more work than we would be gaining.

Another factor that will also constrain our serialization choices as well
as RPC frameworks is that a lot of serialization libraries expect all the
types to be known as runtime. We currently allow users to serialize POJOs
or anything that will fit in PDX, and we want to keep that capability.

[1]: I think I could reconstruct BERT based on the Erlang docs, but Thrift
is harder. gRPC is basically Protobuf-encoded data over HTTP/2. We could
put binary data in protobufs, but I'd rather not.

On Fri, Apr 7, 2017 at 5:23 PM, Jacob Barrett  wrote:

> You are confusing message protocol with user data serialization. The two
> are not related. Look at HTTP, the message protocol is pretty simple, PUT,
> GET, etc., but it does nothing with the data being PUT/GET. On a GET the
> message protocol has a field that specifies the Content-Type and
> Content-Encoding and some other metadata. So the GET could get HTML, JPEG,
> etc. but the protocol doesn't care and doesn't know anything special about
> that type of the data it puts. The structure for JPEG is not defined in the
> HTTP protocol at all.
>
> So relate that to what we want to do. Our message protocol defines a PUT
> and a GET operation, some metadata perhaps and a section for the data. It
> should have no restriction or care on how that data was serialized. The
> protocol does not define in any way the structure of the data being PUT or
> GET.
>
> Separating that concern then, does your argument still stand that PRC
> frameworks do not work for the new Geode protocol?
>
> On Fri, Apr 7, 2017 at 3:11 PM Galen M O'Sullivan 
> wrote:
>
> > I think the main selling point of an RPC framework/IDL is ease-of-use for
> > defined remote communications that look like function calls. If you have
> > calls you're making to remote servers asking them to do work, you can
> > fairly trivially define the interface and then call through. You can then
> > use native types in function calls and they transparently get transformed
> > and sent across the wire.
> >
> > The RPC protocols I've seen are based on the idea that the types that can
> > be sent will be predefined -- otherwise it's hard to describe with an
> IDL.
> >
> > However, we want to support storing unstructured data, or at least data
> > structures that are defined (from the cluster's point of view) at runtime
> > -- one of the main selling points of Geode is PDX serialization, which
> lets
> > us store arbitrary object structures in the cache. If we were to use an
> RPC
> > framework we have all the commands accept byte arrays and include some
> > meta-information. This loses us the ease-of-use.
> >
> > What's left in the protocol then is the calls and the number of arguments
> > they accept, and what order we put those (and the serialized arguments)
> in
> > on the wire. I don't think we gain much by using a preexisting RPC
> > language, and we lose control over the wire format and message structure.
> > If we want to be able to make the protocol really fast, and customized to
> > our use case; if we want to implement asynchronous requests, futures,
> etc.
> > then we have to write wrappers for a given language anyways, and packing
> > those things through an RPC framework like Thrift or gRPC will be an
> extra
> > layer of confusing complexity.
> >
> > Best,
> > Galen
> >
>


Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-07 Thread Jacob Barrett
You are confusing message protocol with user data serialization. The two
are not related. Look at HTTP, the message protocol is pretty simple, PUT,
GET, etc., but it does nothing with the data being PUT/GET. On a GET the
message protocol has a field that specifies the Content-Type and
Content-Encoding and some other metadata. So the GET could get HTML, JPEG,
etc. but the protocol doesn't care and doesn't know anything special about
that type of the data it puts. The structure for JPEG is not defined in the
HTTP protocol at all.

So relate that to what we want to do. Our message protocol defines a PUT
and a GET operation, some metadata perhaps and a section for the data. It
should have no restriction or care on how that data was serialized. The
protocol does not define in any way the structure of the data being PUT or
GET.

Separating that concern then, does your argument still stand that PRC
frameworks do not work for the new Geode protocol?

On Fri, Apr 7, 2017 at 3:11 PM Galen M O'Sullivan 
wrote:

> I think the main selling point of an RPC framework/IDL is ease-of-use for
> defined remote communications that look like function calls. If you have
> calls you're making to remote servers asking them to do work, you can
> fairly trivially define the interface and then call through. You can then
> use native types in function calls and they transparently get transformed
> and sent across the wire.
>
> The RPC protocols I've seen are based on the idea that the types that can
> be sent will be predefined -- otherwise it's hard to describe with an IDL.
>
> However, we want to support storing unstructured data, or at least data
> structures that are defined (from the cluster's point of view) at runtime
> -- one of the main selling points of Geode is PDX serialization, which lets
> us store arbitrary object structures in the cache. If we were to use an RPC
> framework we have all the commands accept byte arrays and include some
> meta-information. This loses us the ease-of-use.
>
> What's left in the protocol then is the calls and the number of arguments
> they accept, and what order we put those (and the serialized arguments) in
> on the wire. I don't think we gain much by using a preexisting RPC
> language, and we lose control over the wire format and message structure.
> If we want to be able to make the protocol really fast, and customized to
> our use case; if we want to implement asynchronous requests, futures, etc.
> then we have to write wrappers for a given language anyways, and packing
> those things through an RPC framework like Thrift or gRPC will be an extra
> layer of confusing complexity.
>
> Best,
> Galen
>


Re: Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-07 Thread Anthony Baker

> On Apr 7, 2017, at 3:11 PM, Galen M O'Sullivan  wrote:
> 
> I think the main selling point of an RPC framework/IDL is ease-of-use for
> defined remote communications that look like function calls. If you have
> calls you're making to remote servers asking them to do work, you can
> fairly trivially define the interface and then call through. You can then
> use native types in function calls and they transparently get transformed
> and sent across the wire.
> 
> The RPC protocols I've seen are based on the idea that the types that can
> be sent will be predefined -- otherwise it's hard to describe with an IDL.
> 
> However, we want to support storing unstructured data, or at least data
> structures that are defined (from the cluster's point of view) at runtime
> -- one of the main selling points of Geode is PDX serialization, which lets
> us store arbitrary object structures in the cache. If we were to use an RPC
> framework we have all the commands accept byte arrays and include some
> meta-information. This loses us the ease-of-use.

IMO, data encoding should be a separate concern from message definition.  I 
would advocate that any approach we take should view the key/value fields as 
opaque bytes.  By keeping data encoding and message definition separate, we can 
evolve them independently.

> 
> What's left in the protocol then is the calls and the number of arguments
> they accept, and what order we put those (and the serialized arguments) in
> on the wire. I don't think we gain much by using a preexisting RPC
> language, and we lose control over the wire format and message structure.
> If we want to be able to make the protocol really fast, and customized to
> our use case; if we want to implement asynchronous requests, futures, etc.
> then we have to write wrappers for a given language anyways, and packing
> those things through an RPC framework like Thrift or gRPC will be an extra
> layer of confusing complexity.

I believe grpc supports an async API.  I would really love to see benchmark 
data for a get() message using grpc.  Comparing that same message sent using an 
async netty implementation would be instructive :-)

> 
> Best,
> Galen

Anthony



Why we shouldn't use RPC Frameworks for the New Geode Protocol

2017-04-07 Thread Galen M O'Sullivan
I think the main selling point of an RPC framework/IDL is ease-of-use for
defined remote communications that look like function calls. If you have
calls you're making to remote servers asking them to do work, you can
fairly trivially define the interface and then call through. You can then
use native types in function calls and they transparently get transformed
and sent across the wire.

The RPC protocols I've seen are based on the idea that the types that can
be sent will be predefined -- otherwise it's hard to describe with an IDL.

However, we want to support storing unstructured data, or at least data
structures that are defined (from the cluster's point of view) at runtime
-- one of the main selling points of Geode is PDX serialization, which lets
us store arbitrary object structures in the cache. If we were to use an RPC
framework we have all the commands accept byte arrays and include some
meta-information. This loses us the ease-of-use.

What's left in the protocol then is the calls and the number of arguments
they accept, and what order we put those (and the serialized arguments) in
on the wire. I don't think we gain much by using a preexisting RPC
language, and we lose control over the wire format and message structure.
If we want to be able to make the protocol really fast, and customized to
our use case; if we want to implement asynchronous requests, futures, etc.
then we have to write wrappers for a given language anyways, and packing
those things through an RPC framework like Thrift or gRPC will be an extra
layer of confusing complexity.

Best,
Galen