Hi Shane, Great point.
There are good public datasets available for testing and development purposes via AWS Public Data Sets (https://aws.amazon.com/datasets/), the US Open Data Initiative (http://catalog.data.gov/dataset), Kaggle's public datasets (https://www.kaggle.com/datasets), etc. Thanks! Ellison Anne On Tue, Jun 7, 2016 at 11:55 AM, Shane Curcuru <a...@shanecurcuru.org> wrote: > Certainly sounds like an interesting project. One thing to think about > will be ensuring you can find sufficent datasets and testsets under > appropriate licenses so any project participant can run tests against a > realistic scenario. > > Joe Witt wrote on 6/7/16 11:25 AM: > > Benjamin, > > > > The correct way to refer to any Apache project is 'Apache Foo' > > True, but in practice - especially outside the project - it is often > shortened to drop the Apache. That's fine in some cases; unfortunately > for some very popular projects it happens so often it's a serious > problem. We have a basic guide that needs more review: > > http://www.apache.org/foundation/marks/guide > > > > > I look forward to hearing people say 'Apache Pirk'. Now matter now > > many times I try to say that quickly it sounds good. > > > > Was there any other concern about the naming other than transposition > > of characters? > > Oddly, my first thought was in an English accent and I heard "Birk", > which is not necessarily a polite thing. > > In any case, it's up to the PPMC of the podling itself (presuming it > gets started here) to choose it's own name. Everyone else here is just > making suggestions, it's up to the people doing the work to decide > (modulo being an acceptable PODLINGNAMESEARCH, when the time comes). > > - Shane > > > > Thanks > > Joe > > > > On Tue, Jun 7, 2016 at 11:21 AM, Benjamin Young <byo...@bigbluehat.com> > wrote: > >> Say it 100 times real fast to someone over a low-grade cell connection. > Type it 50 times as fast as you can without looking at the screen. Or just > swap the middle two letters... > >> > >> Mostly, if there's a chance it will be heard, read, or typed > incorrectly it should--if it can--be avoided. This one just seems too easy > to get wrong. > >> > >> Hope that helps! > >> Benjamin > >> > >> -----Original Message----- > >> From: Joe Witt [mailto:joe.w...@gmail.com] > >> Sent: Tuesday, June 7, 2016 11:15 AM > >> To: general@incubator.apache.org > >> Subject: Re: [DISCUSS] Pirk Incubation Proposal > >> > >> Benjamin, > >> > >> Definitely good to get solid discussion going on naming early. > >> Curious to understand more of your perspective on what could be > potentially offensive about Pirk. > >> > >> Thanks > >> Joe > >> > >> On Tue, Jun 7, 2016 at 10:50 AM, Benjamin Young <byo...@bigbluehat.com> > wrote: > >>> Looks like a great project! > >>> > >>> I'd like to propose (early!) that you consider changing the name from > Pirk, however. It's too close to things that could easily be offensive or > misunderstood. > >>> > >>> My personal recommendation would be "Piranha" > >>> > >>> http://www.morewords.com/ has several more options if you search for > `pir*` or `*pir` or even `*pir*`. > >>> > >>> Beyond that, it looks like you're off to a great start! > >>> > >>> Cheers, > >>> Benjamin > >>> > >>> -----Original Message----- > >>> From: Ellison Anne Williams [mailto:eawilliamsp...@gmail.com] > >>> Sent: Tuesday, June 7, 2016 9:02 AM > >>> To: general@incubator.apache.org > >>> Subject: [DISCUSS] Pirk Incubation Proposal > >>> > >>> Hi All, > >>> > >>> > >>> We would like to discuss the proposal of a new project to the > incubator - Pirk. > >>> > >>> > >>> Pirk is a framework for scalable Private Information Retrieval (PIR). > >>> > >>> > >>> The proposal is contained below and can also be found on the wiki at > >>> https://wiki.apache.org/incubator/PirkProposal > >>> > >>> > >>> Looking forward to the discussion - > >>> > >>> > >>> Thanks! > >>> > >>> > >>> Ellison Anne > >>> > >>> > >>> ____________ > >>> > >>> > >>> = Pirk Proposal = > >>> > >>> == Abstract == > >>> Pirk is a framework for scalable Private Information Retrieval (PIR). > >>> > >>> == Proposal == > >>> > >>> Pirk is a software framework for scalable Private Information > Retrieval and is meant to provide a landing place for robust, scalable, and > practical implementations of PIR algorithms. The initial scalable PIR > algorithms of Pirk were developed at the National Security Agency. > >>> > >>> == Background == > >>> > >>> Private Information Retrieval (PIR) is an area of computer science and > mathematics that enables a user/entity to privately and securely obtain > information from a dataset, to which they have been granted access, without > revealing, to the dataset owner or to an observer, any information > regarding the questions asked or the results obtained. Employing > homomorphic encryption techniques, PIR enables datasets to remain resident > in their native locations while giving the ability to query the datasets > with sensitive terms. > >>> > >>> == Rationale == > >>> > >>> Although PIR has been in existence for over twenty years, it has > largely remained an academic discipline with very little robust or scalable > implementation. Pirk not only provides implementations of novel scalable > PIR algorithms, but it provides a framework into which robust, scalable, > and practical PIR may be developed. > >>> > >>> Pirk fits well within the Apache Software Foundation (ASF) family as > it depends on numerous ASF projects and integrates with several others such > as Hadoop and Spark. We also anticipate developing extensions/adaptors for > several other ASF projects such as Kafka, Storm, HBase, and Accumulo in the > near future. > >>> > >>> == Initial Goals == > >>> > >>> * Ensure all dependencies are compliant with Apache License version > 2.0 and that all code and documentation artifacts have the correct Apache > licensing markings and notice. > >>> > >>> * Establish a formal release process and schedule, allowing for > dependable release cycles in a manner consistent with the Apache > development process. > >>> > >>> * Establish a process which allows different release cycles for the > core framework, extensions/adaptors, and additional algorithms. > >>> > >>> * Grow the community to establish diversity of background and > expertise. > >>> > >>> == Current Status == > >>> > >>> === Meritocracy === > >>> > >>> We will actively seek help and encourage promotion of influence in the > project through meritocracy. We will discuss the requirements in an open > forum. We will encourage and monitor community participation so that > privileges can be extended to those that contribute. > >>> > >>> === Community === > >>> > >>> Pirk currently has a community of developers within the U.S. > government. In open sourcing Pirk we plan to grow the community to a > broader base of industries and will work to align the interaction of our > existing community. > >>> > >>> === Core Developers === > >>> > >>> The initial core developers are employed by the US Government. We will > work to grow the community among a more diverse set of developers and > industries. > >>> > >>> === Alignment === > >>> > >>> Pirk was developed with an open source philosophy in mind and the > Apache way is consistent with the approach we have taken to date. Further, > Pirk depends on numerous ASF libraries and projects including Hadoop, > Spark, Commons, and Maven. We also anticipate extensions and dependencies > with several more ASF projects, including Accumulo, Avro, HBase, Storm, > Kafka, and others. This existing alignment with Apache and the desired > community makes the Apache Incubator a good fit for Pirk. > >>> > >>> > >>> == Known Risks == > >>> > >>> === Orphaned Products === > >>> > >>> Risk of orphaning is limited though it is important to grow the > community. > >>> The project user and developer base is growing and there is already > operational use of Pirk. > >>> > >>> === Inexperience with Open Source === > >>> > >>> The initial committers to Pirk have limited experience with true open > source software development. However, despite the project origins being > from closed source development we have modeled our behavior and community > development on The Apache Way to the greatest extent possible. We are > committed to the ideals of open source software and will eagerly seek out > mentors and sponsors who can help us quickly come up to speed. > >>> > >>> === Homogenous Developers === > >>> > >>> The initial committers of Pirk come from a limited set of entities > though we are committed to recruiting and developing additional committers > from a broad spectrum of industries and backgrounds. > >>> > >>> === Reliance on Salaried Developers === > >>> > >>> We expect Pirk development to continue on salaried time and through > volunteer time. The majority of initial committers are paid by their > employers to contribute to this project. We are committed to developing and > recruiting participation from developers both salaried and non-salaried. > >>> > >>> === Relationship with other Apache Projects === > >>> > >>> As described in the alignment section, Pirk is already heavily > dependent on other ASF projects and we anticipate further dependence and > integration with new and emerging projects in the Apache family. > >>> > >>> === An Excessive Fascination with the Apache Brand === > >>> > >>> We respect the Apache brand and desire to adopt its community building > principles. Our desire is to build and foster an open source community > around scalable, robust PIR which aligns with the Apache tenets. Further, > Apache is a natural home for Pirk given our existing dependencies and > alignment with ASF projects. > >>> > >>> === Documentation === > >>> > >>> At this time there is no Pirk documentation on the web. However, we > have documentation included within the application that details usage. > Using incubator infrastructure we will be rapidly expanding the available > documentation to cover things like installation, developer guide, > frequently asked questions, best practices, and more. > >>> > >>> == Initial Source == > >>> > >>> The core codebase is written in Java and includes detailed Javadocs > and feature documentation. > >>> > >>> == Source and Intellectual Property Submission == > >>> > >>> The Pirk code and documentation materials will be submitted by the > National Security Agency. Pirk has been developed by government employees. > Material developed by the government employees is in the public domain and > no U.S. > >>> copyright exists in works of the federal government. NSA has submitted > Corporate Contributor License Agreement to the Apache Software Foundation; > the Software Grant Agreement is forth coming. > >>> > >>> == External Dependencies == > >>> > >>> We believe all current dependencies are compatible with the ASF > guidelines. > >>> Our dependency licenses come from the Apache v 2.0 and Eclipse Public > v1. > >>> > >>> == Cryptography == > >>> > >>> Consistent with http://www.apache.org/licenses/exports/ we believe > Pirk is classified as ECCN 5D002. In the event that it becomes necessary we > will engage with appropriate Apache members to ensure we file any necessary > paperwork or clarified any cryptographic export license concerns. > >>> > >>> == Required Resources == > >>> > >>> === Mailing Lists === > >>> > >>> * d...@pirk.incubator.apache.org > >>> * priv...@pirk.incubator.apache.org > >>> * comm...@pirk.incubator.apache.org > >>> > >>> === Source Control === > >>> > >>> Pirk requests use of Git for source control (git:// > git.apache.org/pirk.git). > >>> We request a writeable Git repo for Pirk with mirroring to be setup to > Github through INFRA. > >>> > >>> === Issue Tracking === > >>> > >>> JIRA Pirk (PIRK) > >>> > >>> === Initial Committers === > >>> > >>> * Tracy Brown <tbrownpirk at gmail dot com>, CLA submitted > >>> * Christopher Harris <Chris.Harris010 at gmail dot com>, CLA > >>> submitted > >>> * Walter Ray-Dulaney <raydulany at gmail dot com>, CLA submitted > >>> * Jacob Wilder <jacobwilder.opensource at gmail dot com>, CLA > >>> submitted > >>> * Ellison Anne Williams <eawilliamsPirk at gmail dot com>, CLA > >>> confirmed > >>> * Joe Witt (Hortonworks) <joewitt at apache dot org>, CLA confirmed > >>> > >>> == Sponsors == > >>> > >>> === Champion === > >>> > >>> * Billie Rinaldi (Hortonworks) <billie at apache dot org>, IPMC > >>> Member > >>> > >>> === Nominated Mentors === > >>> > >>> * Billie Rinaldi (Hortonworks) <billie at apache dot org>, IPMC > >>> Member > >>> * Joe Witt (Hortonworks) <joewitt at apache dot org>, IPMC Member > >>> * Josh Elser (Hortonworks) <elserj at apache dot org>, IPMC Member > >>> > >>> === Sponsoring Entity === > >>> > >>> We request the Apache Incubator to sponsor this project. > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >