For now I have presumptuously moved my C++ prototype to https://github.com/arrow-data/arrow
I may have some cycles for this over the next few weeks -- it would be great to develop a draft of the IPC protocol for transmitting table / row batch metadata and data headers. I am going to be working on building up enough tools and scaffolding to start assembling a pandas.DataFrame-like Python wrapper layer which will keep me busy for a fair while. Let's decide soon whether we want 1 repo or multiple repos for the reference implementations (C/C++ and Java). 1 repo might be easier for integration testing. I can convert the Google doc spec floating around to Markdown and perhaps we can discuss specific details in GitHub issues? I'll use a separate repo for the format docs. best, Wes On Mon, Dec 14, 2015 at 9:43 AM, Wes McKinney <w...@cloudera.com> wrote: > hi folks, > > In the interim I created a new public GitHub organization to host code > for this effort so we can organize ourselves in advance of more > progress in the ASF: > > https://github.com/arrow-data > > I have a partial C++ implementation of the Arrow spec that I can move > there, along with a to-be-Markdown-ified version of a specification > subject to more iteration. The more pressing short term matter will be > making some progress on the metadata / data headers / IPC protocol > (e.g. using Flatbuffers or the like). > > Thoughts on git repo structure? > > 1) Avro-style — "one repo to rule them all" > 2) Parquet-style — arrow-format, arrow-cpp, arrow-java, etc. > > (I'm personally more in the latter camp, though integration tests may > be more tedious that way) > > Thanks > > On Thu, Dec 3, 2015 at 4:18 PM, Jacques Nadeau <jacq...@dremio.com> wrote: >> I've opened a name search for our top vote getter. >> >> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-92 >> >> >> I also just realized that my previously email dropped other recipients. >> Here it is below. >> >> ---- >> I think we can call the voting closed. Top vote getters: >> >> Apache Arrow (17) >> Apache Herringbone (9) >> Apache Joist (8) >> Apache Colbuf (8) >> >> I'll up a PODLINGNAMESEARCH-* shortly for Arrow. >> >> --- >> >> >> >> >> >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Thu, Dec 3, 2015 at 1:23 AM, Marcel Kornacker <mar...@cloudera.com> >> wrote: >>> >>> Just added my vote. >>> >>> On Thu, Dec 3, 2015 at 12:51 PM, Wes McKinney <w...@cloudera.com> wrote: >>> > Shall we call the voting closed? Any last stragglers? >>> > >>> > On Tue, Dec 1, 2015 at 5:39 PM, Ted Dunning <ted.dunn...@gmail.com> >>> > wrote: >>> >> >>> >> Apache can handle this if we set the groundwork in place. >>> >> >>> >> Also, Twitter's lawyers work for Twitter, not for Apache. As such, >>> >> their >>> >> opinions can't be taken by Apache as legal advice. There are issues of >>> >> privilege, conflict of interest and so on. >>> >> >>> >> >>> >> >>> >> On Wed, Dec 2, 2015 at 7:51 AM, Alex Levenson >>> >> <alexleven...@twitter.com> >>> >> wrote: >>> >>> >>> >>> I can ask about whether Twitter's lawyers can help out -- is that >>> >>> something we need? Or is that something apache helps out with in the >>> >>> next >>> >>> step? >>> >>> >>> >>> On Mon, Nov 30, 2015 at 9:32 PM, Julian Hyde <jh...@apache.org> wrote: >>> >>>> >>> >>>> +1 to have a vote tomorrow. >>> >>>> >>> >>>> Assuming that Vector is out of play, I just did a quick search for >>> >>>> the >>> >>>> top 4 remaining, (“arrow”, “honeycomb”, “herringbone”, “joist"), at >>> >>>> sourceforge, open hub, trademarkia, and on google. There are no >>> >>>> trademarks >>> >>>> for these in similar subject areas. There is a moderately active >>> >>>> project >>> >>>> called “joist” [1]. >>> >>>> >>> >>>> I will point out that “Apache Arrow” has native-american connotations >>> >>>> that we may or may not want to live with (just ask the Washington >>> >>>> Redskins >>> >>>> how they feel about their name). >>> >>>> >>> >>>> If someone would like to vet other names, use the links on >>> >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-90, and fill >>> >>>> out >>> >>>> column C in the spreadsheet. >>> >>>> >>> >>>> Julian >>> >>>> >>> >>>> [1] https://github.com/stephenh/joist >>> >>>> >>> >>>> >>> >>>> On Nov 30, 2015, at 7:01 PM, Jacques Nadeau <jacq...@dremio.com> >>> >>>> wrote: >>> >>>> >>> >>>> +1 >>> >>>> >>> >>>> -- >>> >>>> Jacques Nadeau >>> >>>> CTO and Co-Founder, Dremio >>> >>>> >>> >>>> On Mon, Nov 30, 2015 at 6:34 PM, Wes McKinney <w...@cloudera.com> >>> >>>> wrote: >>> >>>> >>> >>>> Should we have a last call for votes, closing EOD tomorrow (Tuesday)? >>> >>>> I >>> >>>> missed this for a few days last week with holiday travel. >>> >>>> >>> >>>> On Thu, Nov 26, 2015 at 3:04 PM, Julian Hyde <jul...@hydromatic.net> >>> >>>> wrote: >>> >>>> >>> >>>> Consulting a lawyer is part of the Apache branding process but the >>> >>>> first >>> >>>> stage is to gather a list of potential conflicts - >>> >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-90 is an >>> >>>> example. >>> >>>> >>> >>>> The other part, frankly, is to pick your battles. >>> >>>> >>> >>>> A year or so ago Actian re-branded Vectorwise as Vector. >>> >>>> >>> >>>> >>> >>>> http://www.zdnet.com/article/actian-consolidates-its-analytics-portfolio/. >>> >>>> Given that it is an analytic database in the Hadoop space I think >>> >>>> that is >>> >>>> as close to a “direct hit” as it gets. I don’t think we need a lawyer >>> >>>> to >>> >>>> tell us that. Certainly it makes sense to look for conflicts for the >>> >>>> other >>> >>>> alternatives before consulting lawyers. >>> >>>> >>> >>>> Julian >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Nov 25, 2015, at 9:42 PM, Marcel Kornacker <mar...@cloudera.com> >>> >>>> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Tue, Nov 24, 2015 at 3:25 PM, Jacques Nadeau <jacq...@dremio.com> >>> >>>> wrote: >>> >>>> >>> >>>> Ok guys, >>> >>>> >>> >>>> I don't think anyone is doing a thorough analysis of viaability. I >>> >>>> did a >>> >>>> quick glance and the top one (Vector) seems like it would have an >>> >>>> issue >>> >>>> with conflict of an Actian product. The may be fine. Let's do a >>> >>>> second >>> >>>> phase vote. >>> >>>> >>> >>>> >>> >>>> I'm assuming you mean Vectorwise? >>> >>>> >>> >>>> Before we do anything else, could we have a lawyer look into this? >>> >>>> Last >>> >>>> time around that I remember (Parquet), Twitter's lawyers did a good >>> >>>> job >>> >>>> of >>> >>>> weeding out the potential trademark violations. >>> >>>> >>> >>>> Alex, could Twitter get involved this time around as well? >>> >>>> >>> >>>> >>> >>>> >>> >>>> Pick your top 3 (1,2,3 with 3 being top preference) >>> >>>> >>> >>>> Let's get this done by Friday and then we can do a podling name >>> >>>> search >>> >>>> starting with the top one. >>> >>>> >>> >>>> Link again: >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=304381532&vpid=A1 >>> >>>> >>> >>>> thanks >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Jacques Nadeau >>> >>>> CTO and Co-Founder, Dremio >>> >>>> >>> >>>> On Fri, Nov 20, 2015 at 9:24 AM, Jacques Nadeau <jacq...@dremio.com> >>> >>>> wrote: >>> >>>> >>> >>>> Ok, it looks like we have a candidate list (we actually got 11 since >>> >>>> there was a three-way tie for ninth place): >>> >>>> >>> >>>> VectorArrowhoneycombHerringbonejoistV2Pietcolbufbatonimpulsevictor >>> >>>> Next we need to do trademark searches on each of these to see whether >>> >>>> we're likely to have success. I've moved candidates to a second tab: >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=304381532 >>> >>>> >>> >>>> Anybody want to give a hand in analyzing potential conflicts? >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Jacques Nadeau >>> >>>> CTO and Co-Founder, Dremio >>> >>>> >>> >>>> On Mon, Nov 16, 2015 at 12:10 PM, Jacques Nadeau <jacq...@dremio.com> >>> >>>> wrote: >>> >>>> >>> >>>> Everybody should pick their ten favorites using the numbers 1 to 10. >>> >>>> >>> >>>> 10 is most preferred >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Jacques Nadeau >>> >>>> CTO and Co-Founder, Dremio >>> >>>> >>> >>>> On Mon, Nov 16, 2015 at 10:17 AM, Ted Dunning <ted.dunn...@gmail.com> >>> >>>> wrote: >>> >>>> >>> >>>> >>> >>>> Single vote for most preferred? >>> >>>> >>> >>>> Single transferable vote? >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Tue, Nov 17, 2015 at 2:50 AM, Jacques Nadeau <jacq...@dremio.com> >>> >>>> wrote: >>> >>>> >>> >>>> Given that a bunch of people added names to the sheet, I'll take >>> >>>> that as tacit agreement to the proposed process. >>> >>>> >>> >>>> Let's move to the first vote phase. I've added a column for >>> >>>> everybody's votes. Let's try to wrap up the vote by 10am on >>> >>>> Wednesday. >>> >>>> >>> >>>> thanks! >>> >>>> Jacques >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Jacques Nadeau >>> >>>> CTO and Co-Founder, Dremio >>> >>>> >>> >>>> On Thu, Nov 12, 2015 at 12:03 PM, Jacques Nadeau <jacq...@apache.org >>> >>>> >>> >>>> wrote: >>> >>>> >>> >>>> >>> >>>> Hey Guys, >>> >>>> >>> >>>> It sounds like we need to do a little more work on the Vector >>> >>>> proposal >>> >>>> before the board would like to consider it. The main point of >>> >>>> contention >>> >>>> right now is the name of the project. We need to decide on a name >>> >>>> and get >>> >>>> it signed off through PODLINGNAMESEARCH. >>> >>>> >>> >>>> Naming is extremely subjective so I'd like to propose a process for >>> >>>> selection that minimizes pain. This is an initial proposal and >>> >>>> >>> >>>> We do the naming in the following steps >>> >>>> - 1: Collect a set of names to be considered >>> >>>> - 2: Run a vote for 2 days where each member ranks their top 10 >>> >>>> options >>> >>>> 1..10 >>> >>>> - 3: Take the top ten vote getters and do a basic analysis of >>> >>>> whether we >>> >>>> think that any have legal issues. Keep dropping names that have >>> >>>> this until >>> >>>> we get with 10 reasonably solid candidate names >>> >>>> - 5: Take the top ten names and give people 48 hours to rank their >>> >>>> top 3 >>> >>>> names >>> >>>> - 6: Start a PODLINGNAMESEARCH on the top rank one, if that doesn't >>> >>>> work, >>> >>>> try the second and third options. >>> >>>> >>> >>>> I suggest we take name suggestions for step 1 from everyone but then >>> >>>> constrain the voting to the newly proposed project [1]. We could >>> >>>> just do >>> >>>> this in a private email thread but I think doing it on Drill dev is >>> >>>> better >>> >>>> in the interest of transparency. This isn't the perfect place for >>> >>>> that but >>> >>>> I'm not sure a better place exists. >>> >>>> >>> >>>> I'm up for changing any or all of this depending on what others >>> >>>> think. Just >>> >>>> wanted to get the ball rolling on a proposed process. >>> >>>> >>> >>>> If this works, I've posted a doc at [2] that we can use for step 1. >>> >>>> >>> >>>> Thanks, >>> >>>> Jacques >>> >>>> >>> >>>> [1] List of proposed new project members/voters: Todd Lipcon, Ted >>> >>>> Dunning, >>> >>>> Michael Stack, P. Taylor Goetz, Julian Hyde, Julien Le Dem, Jacques >>> >>>> Nadeau, >>> >>>> James Taylor, Jake Luciani, Parth Chandra, Alex Levenson, Marcel >>> >>>> Kornacker, >>> >>>> Steven Phillips, Hanifi Gunes, Wes McKinney, Jason Altekruse, David >>> >>>> Alves, >>> >>>> Zain Asgar, Ippokratis Pandis, Abdel Hakim Deneche, Reynold Xin. >>> >>>> [2] >>> >>>> >>> >>>> >>> >>>> >>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=0 >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Alex Levenson >>> >>> @THISWILLWORK >>> >> >>> >> >> >>