Thanks Micah, that was very helpful! ARROW-7278 looks like a good place to
dig in =]

On Fri, Jul 10, 2020 at 7:33 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Chris,
> I don't think I've seen a formal roadmap for either Gandiva or Flight
> (others might have more context).  What you described is certainly how a
> lot of work gets done.  There has been a slightly more formal roadmap
> proposed for datasets, dataframe and C++ query engine but that is the
> extent of what I recall seeing on the mailing list.
>
> Regarding Gandiva and Flight off the top of my head I can think of a few
> places to potentially start.  I'm not an expert in either of these but
> hopefully people who are can tell me where I'm wrong :)  Also, I'm not sure
> any of these are really "easy" or "beginner" tasks but if you are
> interested in these two areas they would likely provide a way of ramping up
> on the project.
>
> For Gandiva:
>   1. implementing a more efficient string matching algorithm (
> https://issues.apache.org/jira/browse/ARROW-7278) has been raised.  If
> possible it might be nice to see if there is some common code that can be
> shared and benchmarked against the same kernel that exists under compute.
>   2.  I believe we recently made the decision to remove gandiva from
> packaged wheels with the hopes of maybe being able to create a separate
> wheel at some later point in time (I don't think this is a beginner issue
> per se, but worth mentioning).
>   3.  I think there are still probably quite a few expressions/functions
> that haven't been implemented for Gandiva but I don't know if there is an
> exhaustive list.   It seems contributors from Dremio add one every now and
> then.
>
> For flight:
>   1.  I'm not sure that there is a strong reference implementation provided
> for flight.  I believe all of the examples checked in are closer to "toy"
> code (but I haven't looked in while).  Potentially trying to construct a
> more comprehensive example (perhaps something built on-top of the datasets
> API might be interesting).
>   2.  There were middle-ware hooks added for instrumenting flight services
> a while ago.  It might be worth adding "contrib" adapters to 1 or 2 popular
> frameworks that make use of the hooks.
>   3.  We recently introduced a "feature" enum with the hopes it could be
> used to negotiate capabilities between flight client/servers.  Looking into
> implementing that negotiation could be helpful.
>
> Another area that I'm personally interested, but haven't had time to work
> on, but haven't had any time to work on are adapters from and to other
> formats  (specifically Avro and protobuf).
>
> Hope this helps and Welcome!
>
> -Micah
>
> On Thu, Jul 9, 2020 at 1:56 AM Chris Channing <
> christopher.chann...@gmail.com> wrote:
>
> > Antoine/Neal,
> >
> > Thanks for your comments, it's appreciated!
> >
> > My current preference would be to focus on Gandiva and/or Flight, so I'll
> > start looking around there for inspiration. @Neal, regarding your comment
> > around finding a feature that I'm interested in resolving, I agree with
> you
> > and that was primarily my driver for asking if we had a roadmap either at
> > the root or component level. Just to help my understanding though, how
> are
> > the vision-level feature backlogs generated for each of these components
> as
> > I'm assuming there must be something more than just "a user hits a
> > limitation > user implements fix/feature > happy days"? Perhaps a better
> > question might be, what is the short-term vs long-term vision for each of
> > these components (I'm hoping this has been documented in detail somewhere
> > and I've missed it)?
> >
> > @Antoine, thanks for the link to the revised website PR, I'll take a look
> > and comment there.
> >
> > Cheers,
> > Chris
> >
> > On Wed, Jul 8, 2020 at 7:43 PM Neal Richardson <
> > neal.p.richard...@gmail.com>
> > wrote:
> >
> > > Hi Chris, some additional thoughts to what Antoine said.
> > >
> > > Neal
> > >
> > > On Wed, Jul 8, 2020 at 10:56 AM Antoine Pitrou <anto...@python.org>
> > wrote:
> > >
> > > >
> > > > Hi Chris,
> > > >
> > > > Le 08/07/2020 à 12:01, Chris Channing a écrit :
> > > > >
> > > > > I've looked at the contribution guidelines, but rather than
> > arbitrarily
> > > > > picking a jira I was hoping that there was a more structured
> approach
> > > for
> > > > > newbies documented that I might have missed. A few questions that I
> > > have
> > > > > are:
> > > >
> > > > As a starting point, which Arrow implementation would you be
> interested
> > > > in contributing to?  As you know, we have a bunch of them, a subset
> of
> > > > which has its status documented here:
> > > > https://github.com/apache/arrow/blob/master/docs/source/status.rst
> > > >
> > > > >    - Does the community have a light-weight style mentoring system
> to
> > > > help
> > > > >    contributors get up to speed?
> > > >
> > > > We don't.  However some developers are used to communicate on an
> > > > unofficial chat instance at https://ursalabs.zulipchat.com/, where
> you
> > > > can also ask for help (you probably want to post on the "dev"
> stream).
> > > >
> > >
> > > Most new contributors tend to be users who encounter a limitation of
> the
> > > software (or docs) and take it upon themselves to improve it. So one
> way
> > to
> > > get orientation is to start using Arrow and ask specific questions when
> > you
> > > run into trouble.
> > >
> > >
> > > >
> > > > >    - Are there designated component owners/guardians e.g. C++ core,
> > > > Flight,
> > > > >    Gandiva, API's etc that could provide guidance if a developer
> had
> > a
> > > > >    specific focus/interest?
> > > >
> > > > We don't have designated owners, though of course some developers are
> > > > focussed on specific areas.  Best is probably to ask here, though.
> > > > Also, the answers you get can benefit other people.
> > > >
> > >
> > > > >    - Looking at the Arrow jiras in bulk, I noticed that 'easyfix',
> > > > >    'beginner' and 'newbie' labels have been defined. Do you think
> > that
> > > it
> > > > >    makes sense to pick one label and standardise on it for future
> > > backlog
> > > > >    grooming efforts? It would make it easier to identify the
> pipeline
> > > of
> > > > >    issues that future engineers can use to ramp up on the project.
> > > >
> > > > Definitely agreed.  I'm not sure how easy it is to make bulk edits on
> > > > JIRA, though... perhaps someone else can chime in.
> > > >
> > >
> > > Unfortunately, JIRA "labels" are shared with all of the Apache Software
> > > Foundation, so those aren't just for Arrow. I don't observe that we use
> > > them but maybe some people do, and maybe we should start.
> > >
> > > In general though, rather than just looking for "easy" things to do, I
> > > recommend finding a JIRA issue you're personally invested in seeing
> > > resolved because it affects a use case you have. I find that's a more
> > > effective way to learn in general.
> > >
> > >
> > > >
> > > > By the way, one thing were fresh eyes would definitely be useful is
> to
> > > > suggest documentation edits or improvements.
> > > > We also have a small website revamp in preparation, you can see the
> > > > proposed changes in the links below.  Feedback is welcome :-)
> > > > https://github.com/apache/arrow-site/pull/63
> > > > https://enpiar.com/arrow-site/
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > >
> >
>

Reply via email to