Thanks Micah, that was very helpful! ARROW-7278 looks like a good place to dig in =]
On Fri, Jul 10, 2020 at 7:33 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Chris, > I don't think I've seen a formal roadmap for either Gandiva or Flight > (others might have more context). What you described is certainly how a > lot of work gets done. There has been a slightly more formal roadmap > proposed for datasets, dataframe and C++ query engine but that is the > extent of what I recall seeing on the mailing list. > > Regarding Gandiva and Flight off the top of my head I can think of a few > places to potentially start. I'm not an expert in either of these but > hopefully people who are can tell me where I'm wrong :) Also, I'm not sure > any of these are really "easy" or "beginner" tasks but if you are > interested in these two areas they would likely provide a way of ramping up > on the project. > > For Gandiva: > 1. implementing a more efficient string matching algorithm ( > https://issues.apache.org/jira/browse/ARROW-7278) has been raised. If > possible it might be nice to see if there is some common code that can be > shared and benchmarked against the same kernel that exists under compute. > 2. I believe we recently made the decision to remove gandiva from > packaged wheels with the hopes of maybe being able to create a separate > wheel at some later point in time (I don't think this is a beginner issue > per se, but worth mentioning). > 3. I think there are still probably quite a few expressions/functions > that haven't been implemented for Gandiva but I don't know if there is an > exhaustive list. It seems contributors from Dremio add one every now and > then. > > For flight: > 1. I'm not sure that there is a strong reference implementation provided > for flight. I believe all of the examples checked in are closer to "toy" > code (but I haven't looked in while). Potentially trying to construct a > more comprehensive example (perhaps something built on-top of the datasets > API might be interesting). > 2. There were middle-ware hooks added for instrumenting flight services > a while ago. It might be worth adding "contrib" adapters to 1 or 2 popular > frameworks that make use of the hooks. > 3. We recently introduced a "feature" enum with the hopes it could be > used to negotiate capabilities between flight client/servers. Looking into > implementing that negotiation could be helpful. > > Another area that I'm personally interested, but haven't had time to work > on, but haven't had any time to work on are adapters from and to other > formats (specifically Avro and protobuf). > > Hope this helps and Welcome! > > -Micah > > On Thu, Jul 9, 2020 at 1:56 AM Chris Channing < > christopher.chann...@gmail.com> wrote: > > > Antoine/Neal, > > > > Thanks for your comments, it's appreciated! > > > > My current preference would be to focus on Gandiva and/or Flight, so I'll > > start looking around there for inspiration. @Neal, regarding your comment > > around finding a feature that I'm interested in resolving, I agree with > you > > and that was primarily my driver for asking if we had a roadmap either at > > the root or component level. Just to help my understanding though, how > are > > the vision-level feature backlogs generated for each of these components > as > > I'm assuming there must be something more than just "a user hits a > > limitation > user implements fix/feature > happy days"? Perhaps a better > > question might be, what is the short-term vs long-term vision for each of > > these components (I'm hoping this has been documented in detail somewhere > > and I've missed it)? > > > > @Antoine, thanks for the link to the revised website PR, I'll take a look > > and comment there. > > > > Cheers, > > Chris > > > > On Wed, Jul 8, 2020 at 7:43 PM Neal Richardson < > > neal.p.richard...@gmail.com> > > wrote: > > > > > Hi Chris, some additional thoughts to what Antoine said. > > > > > > Neal > > > > > > On Wed, Jul 8, 2020 at 10:56 AM Antoine Pitrou <anto...@python.org> > > wrote: > > > > > > > > > > > Hi Chris, > > > > > > > > Le 08/07/2020 à 12:01, Chris Channing a écrit : > > > > > > > > > > I've looked at the contribution guidelines, but rather than > > arbitrarily > > > > > picking a jira I was hoping that there was a more structured > approach > > > for > > > > > newbies documented that I might have missed. A few questions that I > > > have > > > > > are: > > > > > > > > As a starting point, which Arrow implementation would you be > interested > > > > in contributing to? As you know, we have a bunch of them, a subset > of > > > > which has its status documented here: > > > > https://github.com/apache/arrow/blob/master/docs/source/status.rst > > > > > > > > > - Does the community have a light-weight style mentoring system > to > > > > help > > > > > contributors get up to speed? > > > > > > > > We don't. However some developers are used to communicate on an > > > > unofficial chat instance at https://ursalabs.zulipchat.com/, where > you > > > > can also ask for help (you probably want to post on the "dev" > stream). > > > > > > > > > > Most new contributors tend to be users who encounter a limitation of > the > > > software (or docs) and take it upon themselves to improve it. So one > way > > to > > > get orientation is to start using Arrow and ask specific questions when > > you > > > run into trouble. > > > > > > > > > > > > > > > - Are there designated component owners/guardians e.g. C++ core, > > > > Flight, > > > > > Gandiva, API's etc that could provide guidance if a developer > had > > a > > > > > specific focus/interest? > > > > > > > > We don't have designated owners, though of course some developers are > > > > focussed on specific areas. Best is probably to ask here, though. > > > > Also, the answers you get can benefit other people. > > > > > > > > > > > > - Looking at the Arrow jiras in bulk, I noticed that 'easyfix', > > > > > 'beginner' and 'newbie' labels have been defined. Do you think > > that > > > it > > > > > makes sense to pick one label and standardise on it for future > > > backlog > > > > > grooming efforts? It would make it easier to identify the > pipeline > > > of > > > > > issues that future engineers can use to ramp up on the project. > > > > > > > > Definitely agreed. I'm not sure how easy it is to make bulk edits on > > > > JIRA, though... perhaps someone else can chime in. > > > > > > > > > > Unfortunately, JIRA "labels" are shared with all of the Apache Software > > > Foundation, so those aren't just for Arrow. I don't observe that we use > > > them but maybe some people do, and maybe we should start. > > > > > > In general though, rather than just looking for "easy" things to do, I > > > recommend finding a JIRA issue you're personally invested in seeing > > > resolved because it affects a use case you have. I find that's a more > > > effective way to learn in general. > > > > > > > > > > > > > > By the way, one thing were fresh eyes would definitely be useful is > to > > > > suggest documentation edits or improvements. > > > > We also have a small website revamp in preparation, you can see the > > > > proposed changes in the links below. Feedback is welcome :-) > > > > https://github.com/apache/arrow-site/pull/63 > > > > https://enpiar.com/arrow-site/ > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > >