Some benefits of separating the cookbook from the documentation would
be to decouple its release / publication from Arrow releases, so you
can roll out new content to the published version as soon as it's
merged into the repository, where in the same fashion we might not
want to publish inter-release changes to the documentation. You could
also have a separate entry point to increase navigability (since the
documentation is intended to be more of a reference book).

Given that the Rust projects have decoupled into multiple
repositories, a "cookbook" repository could also be a place to collect
recipes related to DataFusion.

Either option is plenty reasonable, though, so feel free to choose
what makes the most sense to you.

On Thu, Jul 8, 2021 at 12:09 PM Alessandro Molina
<alessan...@ursacomputing.com> wrote:
>
> Thinking about it, I think that having the cookbook into its own repository
> (apache/arrow-cookbook) might lower the barrier for contributors. You only
> need to clone the cookbook and running `make` does also take care of
> installing the required dependencies, so in theory you don't even need to
> care too much about setting up your environment. But we can surely improve
> the README in the repo further to ease contributions.
>
> I think we can also preserve the benefit that Nic mentioned of making sure
> that on each Arrow build the recipes are verified by triggering a build of
> the cookbook repository on each new arrow master change. Worst case, have a
> nightly build for the cookbook that clones that latest arrow master branch.
>
> Having a cookbook for C++ is a very good idea, that might be the next step
> once we finish the Python and R versions. If people want to contribute
> cookbook versions for more languages that would be greatly appreciated too.
>
> On the other hand, while we want to keep the cookbooks in the same
> repository and sharing the same infrastructure to keep a low entry barrier
> (make py/r/X will just compile the cookbook for the language you picked), I
> feel that keeping the cookbook separated per language is a good idea. While
> it's cool to be able to compare the solution between languages, in general
> developers look for the solution in their target language and might
> perceive as noise the other implementations.
> For example, we received similar feedback for the Arrow documentation too,
> that as a Python developer it's hard to find what you are looking for
> because it's mixed with the "format" and "C++" documentation and there are
> a few links back and forth between them.
>
>
>
>
>
> On Thu, Jul 8, 2021 at 11:39 AM Nic <thisis...@gmail.com> wrote:
>
> > One of the possible aims for the cookbook is having interlinked
> > documentation between function docs and the cookbook, and both the R and
> > Python docs include tests that all of the outputs are expected.  Including
> > these tests means that we can immediately see if any code changes render
> > any recipes incorrect.  Therefore the decoupling between cookbook updates
> > and docs updates may not be necessary.
> >
> > That said, there has been mention of having versions of the cookbook tied
> > to released versions of Arrow, which sounds like a great idea.
> >
> > The repo also includes a Makefile which creates all the relevant setup, so
> > hopefully that should simplify things for users.  The R cookbook uses
> > bookdown, which has a feature where a reader can click an 'edit' button and
> > it automatically creates a fork where they can edit the cookbook and submit
> > a PR directly from GitHub.
> >
> > It'd be great to see a lot of recipes in multiple languages, but in the
> > document of possible recipes circulated previously, we identified slightly
> > different needs for recipes for R/Python, and this may be further
> > complicated by writing for slightly different audiences (from what I
> > understand, the pyarrow implementation may be more geared towards people
> > building on top of the low-level bindings, whereas in R, we have both that
> > audience as well as folks who just want to make their dplyr code run faster
> > without needing to know that much about the details of Arrow).
> >
> > I wonder, though, if we could still achieve that by having an additional
> > page that points to the recipes that *are* common between each cookbook.
> >
> > On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <anto...@python.org> wrote:
> >
> > >
> > > Hi Rares,
> > >
> > > Documentation bugs and improvement requests are welcome, feel free to
> > > file them on the JIRA!
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > > > Awesome! We would find C++ versions of these recipes very useful. From
> > > our
> > > > experience the C++ API is much much harder to deal with and error prone
> > > > than the R/Python one.
> > > >
> > > > Cheers,
> > > > Rares
> > > >
> > > > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > > > alessan...@ursacomputing.com> wrote:
> > > >
> > > >> Yes, that was mostly what I meant when I wrote that the next step is
> > > >> opening a PR against the apache/arrow repository itself :D
> > > >> We moved forward in a separate repository initially to be able to
> > cycle
> > > >> more quickly, but we reached a point where we think we can start
> > > >> integrating the cookbook with the Arrow documentation itself.
> > > >>
> > > >> If instead it's preferred to move forward the effort into its own
> > > separated
> > > >> repository (apache/arrow-cookbook) that's an option too, we are open
> > to
> > > >> suggestions from the community.
> > > >>
> > > >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > >>
> > > >>> What do you think about developing this cookbook in an Apache Arrow
> > > >>> repository (it could be something like apache/arrow-cookbook, if not
> > > >>> part of the main development repo)? Creating expanded documentation
> > > >>> resources for learning how to use Apache Arrow to solve problems
> > seems
> > > >>> certainly within the bounds of the community's objectives.
> > > >>>
> > > >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > > >>> <alessan...@ursacomputing.com> wrote:
> > > >>>>
> > > >>>> We finally have a first preview of the cookbook available for R and
> > > >>> Python,
> > > >>>> for anyone interested the two versions are visible at
> > > >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> > > >>>> A new version of the cookbook is automatically published on each new
> > > >>> recipe.
> > > >>>>
> > > >>>> After gathering feedback from interested parties and users, our plan
> > > >> for
> > > >>>> the next step would be to open a PR against the arrow repository and
> > > >>>> automate publishing the cookbook via github actions.
> > > >>>>
> > > >>>> At the moment the recipes implemented are nearly half of those that
> > > >> were
> > > >>>> identified in the dedicated Google Docs (
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > >>>> ) so if you have recipes to suggest feel free to leave comments on
> > > that
> > > >>>> document or suggest edits.
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > >>>> alessan...@ursacomputing.com> wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> I'd like to share with the ML an idea which me and Nic Crane have
> > > >> been
> > > >>>>> experimenting with. It's still in the early stage, but we hope to
> > > >> turn
> > > >>> it
> > > >>>>> into a PR for Arrow documentation soon.
> > > >>>>>
> > > >>>>> The idea is to work on a Cookbook, a collection of ready made
> > > >> recipes,
> > > >>> on
> > > >>>>> how to use Arrow that both end users and developers of third party
> > > >>>>> libraries can refer to when they need to look up "the arrow way" of
> > > >>> doing
> > > >>>>> something.
> > > >>>>>
> > > >>>>> While the arrow documentation reports all features and functions
> > that
> > > >>> are
> > > >>>>> available in arrow, it's not always obvious how to best combine
> > them
> > > >>> for a
> > > >>>>> new user. Sometimes the solution ends up being more complicated
> > than
> > > >>>>> necessary or performs badly due to not obvious side effects like
> > > >>> unexpected
> > > >>>>> memory copies etc.
> > > >>>>>
> > > >>>>> For this reason we thought about starting a documentation that
> > users
> > > >>> can
> > > >>>>> refer to on how to combine arrow features to achieve the results
> > they
> > > >>> care
> > > >>>>> about.
> > > >>>>>
> > > >>>>> We wrote a short document explaining the idea at
> > > >>>>>
> > > >>>
> > > >>
> > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > >>>>>
> > > >>>>> The core idea behind the cookbook is that all recipes should be
> > > >>> testable,
> > > >>>>> so it should be possible to add a CI phase for the cookbook that
> > > >>> verifies
> > > >>>>> that all the recipes still work with the current version of Arrow
> > and
> > > >>> lead
> > > >>>>> to the expected results.
> > > >>>>>
> > > >>>>> At the moment we started it in a separate repository (
> > > >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> > > >>> unsure
> > > >>>>> if it should live inside arrow/docs or its own directory (IE:
> > > >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> > > >> decoupled
> > > >>>>> from the rest of Arrow and the documentation, which would have the
> > > >>> benefit
> > > >>>>> of allowing a dedicated release cycle every time new recipes are
> > > >> added
> > > >>> (at
> > > >>>>> least in the early phase).
> > > >>>>>
> > > >>>>> We are also looking for more ideas about recipes that would be good
> > > >>>>> candidates for inclusion, so if any of you has thoughts about which
> > > >>> recipes
> > > >>>>> we should add please feel free to comment on the document or reply
> > by
> > > >>> mail
> > > >>>>> suggesting more recipes.
> > > >>>>>
> > > >>>>> Any suggestion for improvements is appreciated! We hope to have
> > > >>> something
> > > >>>>> we can release with the next Arrow release.
> > > >>>>>
> > > >>>
> > > >>
> > > >
> > >
> >
> >
> > --
> > Nic Crane
> > _______________________
> > @nic_crane <https://twitter.com/nic_crane>
> > https://thisisnic.github.io/
> >

Reply via email to