Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread Hamza Ben Amara
Congratulations for your valuable contributions ✍️




From: simba nyatsanga
Sent: Monday 28 May, 21:47
Subject: Re: [Celebrate] Arrow has reached 2000 stargeezers
To: dev@arrow.apache.org


Congratulations everyone! On Mon, 28 May 2018 at 21:42 Li Jin wrote: > Congrats 
everyone! > On Mon, May 28, 2018 at 3:21 PM Jacques Nadeau wrote: > > > Woo! > 
> > > On Mon, May 28, 2018 at 4:50 PM, Wes McKinney > wrote: > > > > > Congrats 
all! The journey continues > > > > > > On Mon, May 28, 2018 at 9:43 AM, 
Krisztián Szűcs > > > wrote: > > > > Which makes Arrow the 33rd most starred 
Apache repository (out of > 1555, > > > according to github). > > > > > > > > 
Congratulations! > > > > > >



Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread simba nyatsanga
Congratulations everyone!

On Mon, 28 May 2018 at 21:42 Li Jin  wrote:

> Congrats everyone!
> On Mon, May 28, 2018 at 3:21 PM Jacques Nadeau  wrote:
>
> > Woo!
> >
> > On Mon, May 28, 2018 at 4:50 PM, Wes McKinney 
> wrote:
> >
> > > Congrats all! The journey continues
> > >
> > > On Mon, May 28, 2018 at 9:43 AM, Krisztián Szűcs
> > >  wrote:
> > > > Which makes Arrow the 33rd most starred Apache repository (out of
> 1555,
> > > according to github).
> > > >
> > > > Congratulations!
> > >
> >
>


Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread Li Jin
Congrats everyone!
On Mon, May 28, 2018 at 3:21 PM Jacques Nadeau  wrote:

> Woo!
>
> On Mon, May 28, 2018 at 4:50 PM, Wes McKinney  wrote:
>
> > Congrats all! The journey continues
> >
> > On Mon, May 28, 2018 at 9:43 AM, Krisztián Szűcs
> >  wrote:
> > > Which makes Arrow the 33rd most starred Apache repository (out of 1555,
> > according to github).
> > >
> > > Congratulations!
> >
>


Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-28 Thread Jacques Nadeau
Cutting through the layers of GRPC will be a per language approach thing.
Assuming that each GRPC language implementation does a good job of
separating message encapsulation from the base library, this should be
straight-forward-ish. Hope improves around this as I see creation of
non-protobuf protocols built on top of the base GRPC [1]. How to do this in
each language will probably take time looking at the GRPC internals for
that language but can be a secondary step once you get the protocol working
(you can just pay for extra copies until then).

In my Java approach I believe I do one read copy and zero write copies
(needs more testing) which was my target. (Getting to zero-copy on read
means a lot more complexity because your socket-reading has to be protocol
aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess
KRPC does the same but haven't reviewed the code to confirm.)

Will try to get some more slides/readme and a proper proposed patch up soon.

[1] https://grpc.io/blog/flatbuffers



On Mon, May 28, 2018 at 1:05 AM, Wes McKinney  wrote:

> hey Jacques,
>
> This is great news, I look forward to digging into this. My biggest
> initial question is the Protobuf encapsulation, specifically:
>
> https://github.com/jacques-n/arrow/blob/flight/java/flight/
> src/main/protobuf/flight.proto#L99
>
> My understanding of Protocol Buffers is that on read, the "data_body"
> memory would be copied out of the serialized protobuf that came across
> the wire. Your comment in the .proto says this "comes last in the
> definition to help with sidecar patterns" -- my read is that it would
> be up to us to do our own sidecar implementation, similar to how
> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
> comment there describes pretty much exactly the problem we have). I
> saw that you also replied on a GRPC thread about this issue [2]. Could
> you summarize what (if anything) stands in the way to get zero-copy on
> write and read?
>
> - Wes
>
> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
> rpc_sidecar.h#L34
> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-391692087
>
> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau 
> wrote:
> > FYI, if you want to see an example server you can run with a GRPC
> generated
> > client, you can run the ExampleFlightServer located at [1]. Very basic
> > 'test' with that class and client is located at [2].
> >
> > [1]
> > https://github.com/jacques-n/arrow/tree/flight/java/flight/
> src/main/java/org/apache/arrow/flight/example
> > [2]
> > https://github.com/jacques-n/arrow/blob/flight/java/flight/
> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
> >
> >
> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau 
> wrote:
> >
> >> Hey All,
> >>
> >> I used my Strata talk today as a forcing function to make additional
> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it “Apache
> >> Arrow Flight”. You can take a look at the work here [2]. I’ll work to
> clean
> >> up my work and explain my thoughts about the protocol in the coming
> days.
> >> High-level: use protobuf as a encapsulation format so that any client
> that
> >> is supported in GRPC will work. However, we can optimize the read/write
> >> path for targeted languages and hand control the
> >> serialization/deserialization and memory handling. (I did that in this
> Java
> >> patch [3][4][5].) I also looked at starting to use GRPC generated
> bindings
> >> within Python but it looks like some glue code may be needed in the C++
> >> layer since Python delegates down frequently. I also am still trying to
> >> understand GRPC back-pressure patterns and whether the protocol
> >> realistically needs to change to cover real-world high performance use
> >> cases.
> >>
> >> I’ll send out some slides about the ideas and update README, etc. soon.
> >>
> >> Thanks,
> >> Jacques
> >>
> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> src/main/protobuf/flight.proto
> >> [2] http://github.com/jacques-n/arrow/
> >> [3] https://github.com/jacques-n/arrow/tree/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/grpc
> >> [4] https://github.com/jacques-n/arrow/blob/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/
> ArrowMessage.java#L253
> >>  src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
> >> [5] https://github.com/jacques-n/arrow/blob/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/
> ArrowMessage.java#L185
> >>
> >>
>


Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread Jacques Nadeau
Woo!

On Mon, May 28, 2018 at 4:50 PM, Wes McKinney  wrote:

> Congrats all! The journey continues
>
> On Mon, May 28, 2018 at 9:43 AM, Krisztián Szűcs
>  wrote:
> > Which makes Arrow the 33rd most starred Apache repository (out of 1555,
> according to github).
> >
> > Congratulations!
>


Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread Wes McKinney
Congrats all! The journey continues

On Mon, May 28, 2018 at 9:43 AM, Krisztián Szűcs
 wrote:
> Which makes Arrow the 33rd most starred Apache repository (out of 1555, 
> according to github).
>
> Congratulations!


[Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread Krisztián Szűcs
Which makes Arrow the 33rd most starred Apache repository (out of 1555, 
according to github).

Congratulations!


[jira] [Created] (ARROW-2641) [C++] Investigate spurious memset() calls

2018-05-28 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2641:
-

 Summary: [C++] Investigate spurious memset() calls
 Key: ARROW-2641
 URL: https://issues.apache.org/jira/browse/ARROW-2641
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


{{builder.cc}} has TODO statements of the form:

{code:c++}
  // TODO(emkornfield) valgrind complains without this
  memset(data_->mutable_data(), 0, static_cast(nbytes));
{code}

Ideally we shouldn't have to zero-initialize a data buffer before writing to it.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)