Resurrecting this now that the holidays are coming to a close and the refactor PR is up [1] :D
I also was not trying to spread blame around for the lack of JS releases in the past year. Personally, I feel it's primarily my own fault for not pushing more to wrap up the remaining JS tickets and asking a PMC to create a release candidate months ago, when they may have had more time. But that's precisely why I think syncing up with the mainline releases will benefit us, it will 1) minimize the amount of administrative overhead necessary for a release, and 2) create some external time pressure for us to get releases ready (especially if the entire project moves to timed releases, which sounds great!). The only downside, in my opinion, is we perhaps lose some agility for creating official bugfix releases, but I'm not sure it would be much different from the current procedure. I think now would actually be a great time to sync up if everyone is on board, since both 0.12 and JS-0.4 are wrapping up. What do you all think? > I have been curious if there isn't a middle ground between the full > RC/GM release process, and releasing what are essentially nightlies. npm > has a feature to publish tagged releases that aren't considered mainline > releases yet are still accessible to CI/auditing services. As long as > the list of npm users authorized to publish the packages are Arrow > contributors (and we force npm 2FA), we could have a lane for rapid > iteration and release while we work out the kinks. I would think that this would still be an issue as long as it's coming from an official apache repository. But based on Wes' original comments about downstream releases, maybe something like this would be acceptable if it's coming from a Graphistry fork and clearly marked as an unofficial nightly release? Brian [1] https://github.com/apache/arrow/pull/3290 On Fri, Dec 14, 2018 at 1:58 PM Paul Taylor <ptay...@apache.org> wrote: > Wes, > > I didn't mean to sound like I was criticizing you, the project, or the > release process. You're doing an outstanding job as a project lead, and > it's a fine release process that helps ensure quality and security. Nor > was I passive-aggressively expressing desire to be a PMC -- I'm > overworked as it is and don't have the bandwidth to take on that > responsibility. If I was and did, I'd be much more explicit about taking > on those responsibilities regardless of PMC status :-) > > I was only attempting to describe some of the reasons I (and perhaps > others) haven't/don't push to release the JS package more often, and > compare reality with the original intent behind having JS on a separate > release track. > > I also don't mean to criticize when I say I think a reason we don't > release often might be because none of the JS users or maintainers are > PMCs -- only trying to acknowledge the maintenance and release cycle is > an attention-driven process. Since most of us contribute in conjunction > with our other professional responsibilities, it's totally reasonable > that if JS isn't part of a PMC's day-to-day, it'd be left to us to drive > it forward. > > I have been curious if there isn't a middle ground between the full > RC/GM release process, and releasing what are essentially nightlies. npm > has a feature to publish tagged releases that aren't considered mainline > releases yet are still accessible to CI/auditing services. As long as > the list of npm users authorized to publish the packages are Arrow > contributors (and we force npm 2FA), we could have a lane for rapid > iteration and release while we work out the kinks. > > And lastly, an update on the refactor branch: all the features are > working again, now just fixing the last few issues in the build scripts. > I'm especially pleased that `cat ./some-gigantic-table.arrow | npx > arrow2csv | less` doesn't stream the entire table to less and terminate > with a broken-pipe error anymore :-) > > Paul > > > On 12/14/18 10:31 AM, Wes McKinney wrote: > > hi Paul, > > > > On Thu, Dec 13, 2018 at 8:59 PM Paul Taylor <ptay...@apache.org> wrote: > >> Another update: all the existing features and unit tests are working > >> again except for the Table/RecordBatch streaming toString() > >> implementations (and the `arrow2csv` utility), which I'll update later > >> tonight. > >> > >> On JS release cadence, I think Brian's right that the current setup is > >> working counter to our original intent. I am used to (and prefer) a > >> faster-paced release cycle, essentially releasing early and as often as > >> bugs are fixed or features are added. Indeed, Graphistry maintains a > >> repo <https://github.com/graphistry/arrow/commits/master> with the > >> latest version of the library that we can build against, which I update > >> when I fix any bugs or add features. > >> > > It is common for software vendors to have "downstream" releases, so > > this is reasonable, so long as this work is not promoted as Apache > > releases > > > >> The JS project is young, and sometimes has to move at a rapid pace. I've > >> felt the turnaround time involved in the vote/prepare/verify/publish > >> release process is slower than would be helpful to me. I'm used to > >> publishing patch release to npm as soon as possible, possibly multiple > >> times a day. > > Well, surely the recent security problems with NPM demonstrate that > > there is value in giving the community opportunity to vet a package > > before it is published for the world to use, and that GPG-signing > > packages is an important security measure to ensure that production > > code is coming from a network of trust. It is different if you are > > publishing packages for your own personal or corporate use. > > > >> None of the PMCs contribute to or use the JS version (if that's wrong, > >> hit me up!) so there's been no release pressure from there. None of the > >> JS contributors are PMCs so even if we want to do releases, we have to > >> wait for the a PMC. My take is that everyone on the project (especially > >> PMCs) are probably ungodly busy people, and since not releasing to npm > >> hasn't been blocking me, I opt not to bother folks. > > I am happy to help release the JS package as often as you like, up to > > multiple times per month. I stated this early on in the process, but > > there has not seemed to be much desire to release. Brian's recent > > request to release caught me at a bad time at the end of the year, but > > there are other active PMCs who should be able to help. If you do > > decide you want to release in the next week or two, please let me know > > and I will make the time to help. > > > > The lack of PMCs with an interest in JavaScript is a bit of > > self-perpetuating issue. One of the responsibilities of PMC members > > (and what will enable a committer to become a PMC) is to promote the > > growth and development of a healthy community. This includes making > > sure that the project releases. The JS developer community hasn't > > grown much, though. My approach to such a problem is to act as a > > "community of one" until it changes -- drive a project forward and > > ensure a steady cadence of releases. > > > > - Wes > > > >> > >> On 12/13/18 11:52 AM, Wes McKinney wrote: > >>> +1 for synchronizing to the main releases when possible. In the 0.12 > >>> thread we have discussed moving to time-based releases (e.g. every 2 > >>> months). Time-based releases are helpful to create urgency around > >>> getting work completed, and making sure that the project is always > >>> ready to release. > >>> On Thu, Dec 13, 2018 at 10:39 AM Brian Hulette <hulet...@gmail.com> > wrote: > >>>> Sounds great Paul! Really excited that this refactor is wrapping up. > My > >>>> only concern with including this in 0.4.0 is that I'm not going to > have the > >>>> time to thoroughly review it for a few weeks, so gating on that would > >>>> really delay it. But I can just manually test with some use-cases I > care > >>>> about in lieu of a thorough review in the interest of time. > >>>> > >>>> I think in the future (after 0.12?) it may behoove us to tie back in > to the > >>>> main Arrow release cycle. The idea with the separate JS release was to > >>>> allow us to release faster, but in practice it has done the opposite. > Since > >>>> the fall of 2017 we've cut two major JS releases (0.2, 0.3) while > there > >>>> were four major main releases (0.8 - 0.11). Not to mention the > disjoint > >>>> version numbers can be confusing to users - perhaps not as much of a > >>>> concern now that the format is pretty stable, but it can still be a > >>>> friction point. And finally selfishly - if we had been on the main > release > >>>> cycle, the contributions I made in the summer would have been > released in > >>>> either 0.10 or 0.11 by now. > >>>> > >>>> Brian > >>>> > >>>> On Thu, Dec 13, 2018 at 3:29 AM Paul Taylor <ptay...@apache.org> > wrote: > >>>> > >>>>> The ongoing JS refactor/upgrade branch > >>>>> <https://github.com/trxcllnt/arrow/tree/js-data-refactor/js> is just > >>>>> about done. It's passing all the integration tests, as well as a > hundred > >>>>> or so new unit tests. I have to update existing tests where the APIs > >>>>> changed, battle with closure-compiler a bit, then it'll be ready to > >>>>> merge in and ship out. I think I'll be able to wrap it up in the next > >>>>> couple hours. > >>>>> > >>>>> I started this branch to clean up the Vector Data classes to make it > >>>>> easier to add higher-level Table and Vector operators, but as the > Data > >>>>> classes are fairly embedded in the core, it lead to a larger > refactor of > >>>>> the DataTypes, Vectors, Visitors, and IPC readers and writers. > >>>>> > >>>>> While I was updating the IPC readers and writers, I took the > opportunity > >>>>> to back-port all the Node and WhatWG (browser) streams integration > that > >>>>> we've built for Graphistry. Putting it in the Arrow JS library means > we > >>>>> can better ensure zero-copy when possible, empowers library > consumers to > >>>>> easily build streaming applications in both server and browser > >>>>> environments, and (selfishly) reduces complexity in my code base. It > >>>>> also advances a longer term personal goal to more closely adhere to > the > >>>>> structure and organization of ArrowCPP when reasonable. > >>>>> > >>>>> A non-exhaustive list of updates includes: > >>>>> > >>>>> * Updates the Table, Schema, RecordBatch, Visitor, Vector, Data, and > >>>>> DataTypes to ensure the generic type signatures cascade recursively > >>>>> through the type declarations > >>>>> * New io primitives that abstract over the (mutually exclusive) file > and > >>>>> stream APIs in both node and browser environments > >>>>> * New RecordBatchReaders and RecordBatchWriters that directly use the > >>>>> zero-copy node and browser io primitives > >>>>> * A consolidated reflective Visitor implementation that supports late > >>>>> binding to shortcut traversal, provides an easy API for building > higher > >>>>> level Vector operators > >>>>> * Fixed bugs/added support for reading and writing DictionaryBatch > >>>>> deltas (tricky) > >>>>> * Updated all the dependencies and did some config file gardening to > >>>>> make debugging tests easier > >>>>> * Added a bunch of new tests > >>>>> > >>>>> I'd be more than happy to help shepherd a 0.4.0 release of what's in > >>>>> arrow/master if that's what everyone wants to do. But in the > interest of > >>>>> cutting a more feature-rich release and preventing customers paying > the > >>>>> cost of updating twice in a short time span, I vote we hold off for > >>>>> another day or two and merge + release the work in the refactor > branch. > >>>>> > >>>>> Paul > >>>>> > >>>>> On 12/9/18 10:51 AM, Wes McKinney wrote: > >>>>>> I agree that we should cut a JavaScript release. > >>>>>> > >>>>>> With the amount of maintenance work on my plate I have to declare > >>>>>> bankruptcy on doing any more than I am right now. Can another PMC > >>>>>> volunteer to be the RM for the 0.4.0 JavaScript release? > >>>>>> > >>>>>> Thanks > >>>>>> Wes > >>>>>> On Tue, Dec 4, 2018 at 10:07 PM Brian Hulette<hulet...@gmail.com> > >>>>> wrote: > >>>>>>> Hi all, > >>>>>>> It's been quite a while since our last major Arrow JS release > (0.3.0 on > >>>>>>> February 22!), and since then we've added several new features > that will > >>>>>>> make Arrow JS much easier to adopt. We've added convenience > functions > >>>>> for > >>>>>>> creating Arrow vectors and tables natively in JavaScript, an IPC > writer, > >>>>>>> and a row proxy interface that will make integrating with existing > JS > >>>>>>> libraries much simpler. > >>>>>>> > >>>>>>> I think it's time we cut 0.4.0, so I spent some time closing out or > >>>>>>> postponing the last few JIRAs in JS-0.4.0. I got it down to just > one > >>>>> JIRA > >>>>>>> which involves documenting the release process - hopefully we can > close > >>>>>>> that out as we go through it again. > >>>>>>> > >>>>>>> Please let me know if you think it makes sense to cut JS-0.4.0 > now, or > >>>>> if > >>>>>>> you have any concerns. > >>>>>>> > >>>>>>> Brian >