C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-04 Thread John R. Frank
Accumulo Developers,

We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
that the lowest hanging fruit is to stop using the thrift proxy. Per discussion 
about Python and thrift proxy in the users list [1], I'm wondering if anyone is 
interested in helping with a native C++ client?  There is a start on one here 
[2]. We could offer a bounty or maybe make a consulting project depending who 
is interested in it. 

We also looked at trying to run a separate thrift proxy for every worker thread 
or process.  With many cores on a box, eg 32, it just doesn't seem practical to 
run that many proxies, even if they all run on a single JVM. We'd be glad to 
hear ideas on that front too. 

A potentially big benefit of making a proper C++ accumulo client is that it is 
straightforward to expose native interfaces in Python (via pyObject), Go [3], 
Ruby [4], and other languages.  

Thanks for any advice, pointers, interest. 

John


1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html

2--
https://github.com/phrocker/apeirogon

3-- http://golang.org/cmd/cgo/

4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/


Sent from +1-617-899-2066

Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread Josh Elser

It'd be really cool to see a C++ client -- fully implemented or not. The 
increased performance via other languages like you said would be really nice, 
but I'd also be curious to see how the server characteristics change when the 
client might be sending data at a much faster rate.

My C++ is super rusty these days, but I'd be happy to help out any devs who can 
spearhead the effort :)

John R. Frank wrote:

Accumulo Developers,

We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
that the lowest hanging fruit is to stop using the thrift proxy. Per discussion 
about Python and thrift proxy in the users list [1], I'm wondering if anyone is 
interested in helping with a native C++ client?  There is a start on one here 
[2]. We could offer a bounty or maybe make a consulting project depending who 
is interested in it.

We also looked at trying to run a separate thrift proxy for every worker thread 
or process.  With many cores on a box, eg 32, it just doesn't seem practical to 
run that many proxies, even if they all run on a single JVM. We'd be glad to 
hear ideas on that front too.

A potentially big benefit of making a proper C++ accumulo client is that it is 
straightforward to expose native interfaces in Python (via pyObject), Go [3], 
Ruby [4], and other languages.

Thanks for any advice, pointers, interest.

John


1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html

2--
https://github.com/phrocker/apeirogon

3-- http://golang.org/cmd/cgo/

4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/


Sent from +1-617-899-2066


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread Corey Nolet
I'm all for this- though I'm curious to know the thoughts about maintenance
and the design. Are we going to use thrift to tie the C++ client calls into
the server-side components? Is that going to be maintained through a
separate effort or is the plan to  have the Accumulo community officially
support it?

On Mon, Oct 6, 2014 at 2:34 PM, Josh Elser  wrote:

> It'd be really cool to see a C++ client -- fully implemented or not. The
> increased performance via other languages like you said would be really
> nice, but I'd also be curious to see how the server characteristics change
> when the client might be sending data at a much faster rate.
>
> My C++ is super rusty these days, but I'd be happy to help out any devs
> who can spearhead the effort :)
>
>
> John R. Frank wrote:
>
>> Accumulo Developers,
>>
>> We're trying to boost throughput of non-Java tools with Accumulo.  It
>> seems that the lowest hanging fruit is to stop using the thrift proxy. Per
>> discussion about Python and thrift proxy in the users list [1], I'm
>> wondering if anyone is interested in helping with a native C++ client?
>> There is a start on one here [2]. We could offer a bounty or maybe make a
>> consulting project depending who is interested in it.
>>
>> We also looked at trying to run a separate thrift proxy for every worker
>> thread or process.  With many cores on a box, eg 32, it just doesn't seem
>> practical to run that many proxies, even if they all run on a single JVM.
>> We'd be glad to hear ideas on that front too.
>>
>> A potentially big benefit of making a proper C++ accumulo client is that
>> it is straightforward to expose native interfaces in Python (via pyObject),
>> Go [3], Ruby [4], and other languages.
>>
>> Thanks for any advice, pointers, interest.
>>
>> John
>>
>>
>> 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
>>
>> 2--
>> https://github.com/phrocker/apeirogon
>>
>> 3-- http://golang.org/cmd/cgo/
>>
>> 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
>>
>>
>> Sent from +1-617-899-2066
>>
>


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread John R. Frank


There is an initial start on a C++ client here:

https://github.com/phrocker/apeirogon/issues/1#issuecomment-57958812

Looks like it is starting to use the underlying thrift interfaces, just 
like the java client.


It would be great to get this supported in core accumulo.  My company 
would be interested in helping to support work on this.  To get it jump 
started, we could offer a bounty or maybe focused consulting engagement, 
and then longer term we could help maintain the code (since we'd be using 
it.)


jrf



On Mon, 6 Oct 2014, Corey Nolet wrote:


I'm all for this- though I'm curious to know the thoughts about maintenance
and the design. Are we going to use thrift to tie the C++ client calls into
the server-side components? Is that going to be maintained through a
separate effort or is the plan to  have the Accumulo community officially
support it?

On Mon, Oct 6, 2014 at 2:34 PM, Josh Elser  wrote:


It'd be really cool to see a C++ client -- fully implemented or not. The
increased performance via other languages like you said would be really
nice, but I'd also be curious to see how the server characteristics change
when the client might be sending data at a much faster rate.

My C++ is super rusty these days, but I'd be happy to help out any devs
who can spearhead the effort :)


John R. Frank wrote:


Accumulo Developers,

We're trying to boost throughput of non-Java tools with Accumulo.  It
seems that the lowest hanging fruit is to stop using the thrift proxy. Per
discussion about Python and thrift proxy in the users list [1], I'm
wondering if anyone is interested in helping with a native C++ client?
There is a start on one here [2]. We could offer a bounty or maybe make a
consulting project depending who is interested in it.

We also looked at trying to run a separate thrift proxy for every worker
thread or process.  With many cores on a box, eg 32, it just doesn't seem
practical to run that many proxies, even if they all run on a single JVM.
We'd be glad to hear ideas on that front too.

A potentially big benefit of making a proper C++ accumulo client is that
it is straightforward to expose native interfaces in Python (via pyObject),
Go [3], Ruby [4], and other languages.

Thanks for any advice, pointers, interest.

John


1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html

2--
https://github.com/phrocker/apeirogon

3-- http://golang.org/cmd/cgo/

4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/


Sent from +1-617-899-2066







Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread David Medinets
How far away from the theoretical maximum rate is the thrift protocol?
What kind of gain is expected from the native C++ approach?

On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank  wrote:
> Accumulo Developers,
>
> We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
> that the lowest hanging fruit is to stop using the thrift proxy. Per 
> discussion about Python and thrift proxy in the users list [1], I'm wondering 
> if anyone is interested in helping with a native C++ client?  There is a 
> start on one here [2]. We could offer a bounty or maybe make a consulting 
> project depending who is interested in it.
>
> We also looked at trying to run a separate thrift proxy for every worker 
> thread or process.  With many cores on a box, eg 32, it just doesn't seem 
> practical to run that many proxies, even if they all run on a single JVM. 
> We'd be glad to hear ideas on that front too.
>
> A potentially big benefit of making a proper C++ accumulo client is that it 
> is straightforward to expose native interfaces in Python (via pyObject), Go 
> [3], Ruby [4], and other languages.
>
> Thanks for any advice, pointers, interest.
>
> John
>
>
> 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
>
> 2--
> https://github.com/phrocker/apeirogon
>
> 3-- http://golang.org/cmd/cgo/
>
> 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
>
>
> Sent from +1-617-899-2066


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread Corey Nolet
Yeah the reason ask about having the community support it is because the
thrift interfaces are all internal workings, thus, code other than the
client API should probably be maintained internally. This will add a slight
overhead to development and maintenance of these thrift services because
more implementations will now need to be kept up to date and modified when
changes need to be implemented.

Not saying it should be a showstopper, just pointing it out.

On Mon, Oct 6, 2014 at 4:15 PM, David Medinets 
wrote:

> How far away from the theoretical maximum rate is the thrift protocol?
> What kind of gain is expected from the native C++ approach?
>
> On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank  wrote:
> > Accumulo Developers,
> >
> > We're trying to boost throughput of non-Java tools with Accumulo.  It
> seems that the lowest hanging fruit is to stop using the thrift proxy. Per
> discussion about Python and thrift proxy in the users list [1], I'm
> wondering if anyone is interested in helping with a native C++ client?
> There is a start on one here [2]. We could offer a bounty or maybe make a
> consulting project depending who is interested in it.
> >
> > We also looked at trying to run a separate thrift proxy for every worker
> thread or process.  With many cores on a box, eg 32, it just doesn't seem
> practical to run that many proxies, even if they all run on a single JVM.
> We'd be glad to hear ideas on that front too.
> >
> > A potentially big benefit of making a proper C++ accumulo client is that
> it is straightforward to expose native interfaces in Python (via pyObject),
> Go [3], Ruby [4], and other languages.
> >
> > Thanks for any advice, pointers, interest.
> >
> > John
> >
> >
> > 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
> >
> > 2--
> > https://github.com/phrocker/apeirogon
> >
> > 3-- http://golang.org/cmd/cgo/
> >
> > 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
> >
> >
> > Sent from +1-617-899-2066
>


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-06 Thread John R. Frank
Two kinds of gains:

1) single client throughput:  the extra RPC hop through the proxy deserializes 
and then reserializes the messages.  With the proxy running locally the extra 
network hop is less of an issue.  This was discussed on the user list (see link 
earlier in this thread), and 5x slow down was suggested as a possible swag 
estimate. 

2) cluster management complexity: it's clearly best to have the proxy local to 
the workers, but if you have a worker on every core of a large box (eg 32), 
then having a single proxy on each worker machine becomes a bottleneck. Running 
many proxies on a single JVM is the next thing we could try to improve this --- 
having a native client seems preferable. 


Comments?

jrf


> On Oct 6, 2014, at 4:15 PM, David Medinets  wrote:
> 
> How far away from the theoretical maximum rate is the thrift protocol?
> What kind of gain is expected from the native C++ approach?
> 
>> On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank  wrote:
>> Accumulo Developers,
>> 
>> We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
>> that the lowest hanging fruit is to stop using the thrift proxy. Per 
>> discussion about Python and thrift proxy in the users list [1], I'm 
>> wondering if anyone is interested in helping with a native C++ client?  
>> There is a start on one here [2]. We could offer a bounty or maybe make a 
>> consulting project depending who is interested in it.
>> 
>> We also looked at trying to run a separate thrift proxy for every worker 
>> thread or process.  With many cores on a box, eg 32, it just doesn't seem 
>> practical to run that many proxies, even if they all run on a single JVM. 
>> We'd be glad to hear ideas on that front too.
>> 
>> A potentially big benefit of making a proper C++ accumulo client is that it 
>> is straightforward to expose native interfaces in Python (via pyObject), Go 
>> [3], Ruby [4], and other languages.
>> 
>> Thanks for any advice, pointers, interest.
>> 
>> John
>> 
>> 
>> 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
>> 
>> 2--
>> https://github.com/phrocker/apeirogon
>> 
>> 3-- http://golang.org/cmd/cgo/
>> 
>> 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
>> 
>> 
>> Sent from +1-617-899-2066


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-08 Thread Adam Fuchs
John,

Do you have any performance numbers that you can share around your use
of the existing proxy solution? One of the reasons that Thrift is
performant for Accumulo is that messages are batched by the client
library and sent over a smaller number of RPCs. A C++ client will also
need to have mechanisms like the BatchWriter to get the best
performance. Also, it may be possible to make the proxy faster by
batching more data into the update call.

Adam


On Mon, Oct 6, 2014 at 5:20 PM, John R. Frank  wrote:
> Two kinds of gains:
>
> 1) single client throughput:  the extra RPC hop through the proxy 
> deserializes and then reserializes the messages.  With the proxy running 
> locally the extra network hop is less of an issue.  This was discussed on the 
> user list (see link earlier in this thread), and 5x slow down was suggested 
> as a possible swag estimate.
>
> 2) cluster management complexity: it's clearly best to have the proxy local 
> to the workers, but if you have a worker on every core of a large box (eg 
> 32), then having a single proxy on each worker machine becomes a bottleneck. 
> Running many proxies on a single JVM is the next thing we could try to 
> improve this --- having a native client seems preferable.
>
>
> Comments?
>
> jrf
>
>
>> On Oct 6, 2014, at 4:15 PM, David Medinets  wrote:
>>
>> How far away from the theoretical maximum rate is the thrift protocol?
>> What kind of gain is expected from the native C++ approach?
>>
>>> On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank  wrote:
>>> Accumulo Developers,
>>>
>>> We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
>>> that the lowest hanging fruit is to stop using the thrift proxy. Per 
>>> discussion about Python and thrift proxy in the users list [1], I'm 
>>> wondering if anyone is interested in helping with a native C++ client?  
>>> There is a start on one here [2]. We could offer a bounty or maybe make a 
>>> consulting project depending who is interested in it.
>>>
>>> We also looked at trying to run a separate thrift proxy for every worker 
>>> thread or process.  With many cores on a box, eg 32, it just doesn't seem 
>>> practical to run that many proxies, even if they all run on a single JVM. 
>>> We'd be glad to hear ideas on that front too.
>>>
>>> A potentially big benefit of making a proper C++ accumulo client is that it 
>>> is straightforward to expose native interfaces in Python (via pyObject), Go 
>>> [3], Ruby [4], and other languages.
>>>
>>> Thanks for any advice, pointers, interest.
>>>
>>> John
>>>
>>>
>>> 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
>>>
>>> 2--
>>> https://github.com/phrocker/apeirogon
>>>
>>> 3-- http://golang.org/cmd/cgo/
>>>
>>> 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
>>>
>>>
>>> Sent from +1-617-899-2066


Re: C++ accumulo client --> native clients for Python, Go, Ruby etc

2014-10-08 Thread John R. Frank



We're running a experiment with multiple local proxies to try to better 
understand the batching and bottlenecking issues.  Will report back.


jrf