As a clarification, all votes were +1
On Mon, Oct 26, 2015 at 10:59 AM, Manoharan, Arun <armanoha...@ebay.com> wrote: > Hello Everyone, > > Thanks for participating in the vote and discussions about Eagle. > > Binding votes = 10 > Non-binding votes = 14 > Total votes = 24 > > Thanks, > > Arun > > > On 10/25/15, 8:37 PM, "Don Bosco Durai" <bo...@apache.org> wrote: > > >+1 non binding > >Bosco > > > > > > > > _____________________________ > >From: Li Yang <liy...@apache.org> > >Sent: Sunday, October 25, 2015 8:13 PM > >Subject: Re: [VOTE] Accept Eagle into Apache Incubation > >To: <general@incubator.apache.org> > > > > > >+1 (non-binding) > > > >On Mon, Oct 26, 2015 at 10:50 AM, hongbin ma <mahong...@apache.org> > wrote: > > > >> +1 (non binding) > >> > >> On Mon, Oct 26, 2015 at 12:20 AM, Ralph Goers > >><ralph.go...@dslextreme.com> > >> wrote: > >> > >> > +1 (binding) > >> > > >> > Ralph > >> > > >> > > On Oct 23, 2015, at 7:11 AM, Manoharan, Arun <armanoha...@ebay.com> > >> > wrote: > >> > > > >> > > Hello Everyone, > >> > > > >> > > Thanks for all the feedback on the Eagle Proposal. > >> > > > >> > > I would like to call for a [VOTE] on Eagle joining the ASF as an > >> > incubation project. > >> > > > >> > > The vote is open for 72 hours: > >> > > > >> > > [ ] +1 accept Eagle in the Incubator > >> > > [ ] ±0 > >> > > [ ] -1 (please give reason) > >> > > > >> > > Eagle is a Monitoring solution for Hadoop to instantly identify > >>access > >> > to sensitive data, recognize attacks, malicious activities and take > >> actions > >> > in real time. Eagle supports a wide variety of policies on HDFS data > >>and > >> > Hive. Eagle also provides machine learning models for detecting > >>anomalous > >> > user behavior in Hadoop. > >> > > > >> > > The proposal is available on the wiki here: > >> > > https://wiki.apache.org/incubator/EagleProposal > >> > > > >> > > The text of the proposal is also available at the end of this email. > >> > > > >> > > Thanks for your time and help. > >> > > > >> > > Thanks, > >> > > Arun > >> > > > >> > > <COPY of the proposal in text format> > >> > > > >> > > Eagle > >> > > > >> > > Abstract > >> > > Eagle is an Open Source Monitoring solution for Hadoop to instantly > >> > identify access to sensitive data, recognize attacks, malicious > >> activities > >> > in hadoop and take actions. > >> > > > >> > > Proposal > >> > > Eagle audits access to HDFS files, Hive and HBase tables in real > >>time, > >> > enforces policies defined on sensitive data access and alerts or > >>blocks > >> > user’s access to that sensitive data in real time. Eagle also creates > >> user > >> > profiles based on the typical access behaviour for HDFS and Hive and > >> sends > >> > alerts when anomalous behaviour is detected. Eagle can also import > >> > sensitive data information classified by external classification > >>engines > >> to > >> > help define its policies. > >> > > > >> > > Overview of Eagle > >> > > Eagle has 3 main parts. > >> > > 1.Data collection and storage - Eagle collects data from various > >>hadoop > >> > logs in real time using Kafka/Yarn API and uses HDFS and HBase for > >> storage. > >> > > 2.Data processing and policy engine - Eagle allows users to create > >> > policies based on various metadata properties on HDFS, Hive and HBase > >> data. > >> > > 3.Eagle services - Eagle services include policy manager, query > >>service > >> > and the visualization component. Eagle provides intuitive user > >>interface > >> to > >> > administer Eagle and an alert dashboard to respond to real time > >>alerts. > >> > > > >> > > Data Collection and Storage: > >> > > Eagle provides programming API for extending Eagle to integrate any > >> data > >> > source into Eagle policy evaluation framework. For example, Eagle hdfs > >> > audit monitoring collects data from Kafka which is populated from > >> namenode > >> > log4j appender or from logstash agent. Eagle hive monitoring collects > >> hive > >> > query logs from running job through YARN API, which is designed to be > >> > scalable and fault-tolerant. Eagle uses HBase as storage for storing > >> > metadata and metrics data, and also supports relational database > >>through > >> > configuration change. > >> > > > >> > > Data Processing and Policy Engine: > >> > > Processing Engine: Eagle provides stream processing API which is an > >> > abstraction of Apache Storm. It can also be extended to other > >>streaming > >> > engines. This abstraction allows developers to assemble data > >> > transformation, filtering, external data join etc. without physically > >> bound > >> > to a specific streaming platform. Eagle streaming API allows > >>developers > >> to > >> > easily integrate business logic with Eagle policy engine and > >>internally > >> > Eagle framework compiles business logic execution DAG into program > >> > primitives of underlying stream infrastructure e.g. Apache Storm. For > >> > example, Eagle HDFS monitoring transforms audit log from Namenode to > >> object > >> > and joins sensitivity metadata, security zone metadata which are > >> generated > >> > from external programs or configured by user. Eagle hive monitoring > >> filters > >> > running jobs to get hive query string and parses query string into > >>object > >> > and then joins sensitivity metadata. > >> > > Alerting Framework: Eagle Alert Framework includes stream metadata > >>API, > >> > scalable policy engine framework, extensible policy engine framework. > >> > Stream metadata API allows developers to declare event schema > >>including > >> > what attributes constitute an event, what is the type for each > >>attribute, > >> > and how to dynamically resolve attribute value in runtime when user > >> > configures policy. Scalable policy engine framework allows policies > >>to be > >> > executed on different physical nodes in parallel. It is also used to > >> define > >> > your own policy partitioner class. Policy engine framework together > >>with > >> > streaming partitioning capability provided by all streaming platforms > >> will > >> > make sure policies and events can be evaluated in a fully distributed > >> way. > >> > Extensible policy engine framework allows developer to plugin a new > >> policy > >> > engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy > >> > engine which Eagle supports as first-class citizen. > >> > > Machine Learning module: Eagle provides capabilities to define user > >> > activity patterns or user profiles for Hadoop users based on the user > >> > behaviour in the platform. These user profiles are modeled using > >>Machine > >> > Learning algorithms and used for detection of anomalous users > >>activities. > >> > Eagle uses Eigen Value Decomposition, and Density Estimation > >>algorithms > >> for > >> > generating user profile models. The model reads data from HDFS audit > >> logs, > >> > preprocesses and aggregates data, and generates models using Spark > >> > programming APIs. Once models are generated, Eagle uses stream > >>processing > >> > engine for near real-time anomaly detection to determine if any user’s > >> > activities are suspicious or not. > >> > > > >> > > Eagle Services: > >> > > Query Service: Eagle provides SQL-like service API to support > >> > comprehensive computation for huge set of data on the fly, for e.g. > >> > comprehensive filtering, aggregation, histogram, sorting, top, > >> arithmetical > >> > expression, pagination etc. HBase is the data storage which Eagle > >> supports > >> > as first-class citizen, relational database is supported as well. For > >> HBase > >> > storage, Eagle query framework compiles user provided SQL-like query > >>into > >> > HBase native filter objects and execute it through HBase coprocessor > >>on > >> the > >> > fly. > >> > > Policy Manager: Eagle policy manager provides UI and Restful API for > >> > user to define policy with just a few clicks. It includes site > >>management > >> > UI, policy editor, sensitivity metadata import, HDFS or Hive sensitive > >> > resource browsing, alert dashboards etc. > >> > > Background > >> > > Data is one of the most important assets for today’s businesses, > >>which > >> > makes data security one of the top priorities of today’s enterprises. > >> > Hadoop is widely used across different verticals as a big data > >>repository > >> > to store this data in most modern enterprises. > >> > > At eBay we use hadoop platform extensively for our data processing > >> > needs. Our data in Hadoop is becoming bigger and bigger as our user > >>base > >> is > >> > seeing an exponential growth. Today there are variety of data sets > >> > available in Hadoop cluster for our users to consume. eBay has around > >>120 > >> > PB of data stored in HDFS across 6 different clusters and around 1800+ > >> > active hadoop users consuming data thru Hive, HBase and mapreduce jobs > >> > everyday to build applications using this data. With this astronomical > >> > growth of data there are also challenges in securing sensitive data > >>and > >> > monitoring the access to this sensitive data. Today in large > >> organizations > >> > HDFS is the defacto standard for storing big data. Data sets which > >> includes > >> > and not limited to consumer sentiment, social media data, customer > >> > segmentation, web clicks, sensor data, geo-location and transaction > >>data > >> > get stored in Hadoop for day to day business needs. > >> > > We at eBay want to make sure the sensitive data and data platforms > >>are > >> > completely protected from security breaches. So we partnered very > >>closely > >> > with our Information Security team to understand the requirements for > >> Eagle > >> > to monitor sensitive data access on hadoop: > >> > > 1.Ability to identify and stop security threats in real time > >> > > 2.Scale for big data (Support PB scale and Billions of events) > >> > > 3.Ability to create data access policies > >> > > 4.Support multiple data sources like HDFS, HBase, Hive > >> > > 5.Visualize alerts in real time > >> > > 6.Ability to block malicious access in real time > >> > > We did not find any data access monitoring solution that available > >> today > >> > and can provide the features and functionality that we need to monitor > >> the > >> > data access in the hadoop ecosystem at our scale. Hence with an > >>excellent > >> > team of world class developers and several users, we have been able to > >> > bring Eagle into production as well as open source it. > >> > > > >> > > Rationale > >> > > In today’s world; data is an important asset for any company. > >> Businesses > >> > are using data extensively to create amazing experiences for users. > >>Data > >> > has to be protected and access to data should be secured from security > >> > breaches. Today Hadoop is not only used to store logs but also stores > >> > financial data, sensitive data sets, geographical data, user click > >>stream > >> > data sets etc. which makes it more important to be protected from > >> security > >> > breaches. To secure a data platform there are multiple things that > >>need > >> to > >> > happen. One is having a strong access control mechanism which today is > >> > provided by Apache Ranger and Apache Sentry. These tools provide the > >> > ability to provide fine grain access control mechanism to data sets on > >> > hadoop. But there is a big gap in terms of monitoring all the data > >>access > >> > events and activities in order to securing the hadoop data platform. > >> > Together with strong access control, perimeter security and data > >>access > >> > monitoring in place data in the hadoop clusters can be secured against > >> > breaches. We looked around and found following: > >> > > Existing data activity monitoring products are designed for > >>traditional > >> > databases and data warehouse. Existing monitoring platforms cannot > >>scale > >> > out to support fast growing data and petabyte scale. Few products in > >>the > >> > industry are still very early in terms of supporting HDFS, Hive, HBase > >> data > >> > access monitoring. > >> > > As mentioned in the background, the business requirement and > >>urgency to > >> > secure the data from users with malicious intent drove eBay to invest > >>in > >> > building a real time data access monitoring solution from scratch to > >> offer > >> > real time alerts and remediation features for malicious data access. > >> > > With the power of open source distributed systems like Hadoop, Kafka > >> and > >> > much more we were able to develop a data activity monitoring system > >>that > >> > can scale, identify and stop malicious access in real time. > >> > > Eagle allows admins to create standard access policies and rules for > >> > monitoring HDFS, Hive and HBase data. Eagle also provides out of box > >> > machine learning models for modeling user profiles based on user > >>access > >> > behaviour and use the model to alert on anomalies. > >> > > > >> > > Current Status > >> > > > >> > > Meritocracy > >> > > Eagle has been deployed in production at eBay for monitoring > >>billions > >> of > >> > events per day from HDFS and Hive operations. From the start; the > >>product > >> > has been built with focus on high scalability and application > >> extensibility > >> > in mind and Eagle has demonstrated great performance in responding to > >> > suspicious events instantly and great flexibility in defining policy. > >> > > > >> > > Community > >> > > Eagle seeks to develop the developer and user communities during > >> > incubation. > >> > > > >> > > Core Developers > >> > > Eagle is currently being designed and developed by engineers from > >>eBay > >> > Inc. – Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang, > >> > Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of > >> > these core developers have deep expertise in developing monitoring > >> products > >> > for the Hadoop ecosystem. > >> > > > >> > > Alignment > >> > > The ASF is a natural host for Eagle given that it is already the > >>home > >> of > >> > Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data > >> > projects. Eagle leverages lot of Apache open-source products. Eagle > >>was > >> > designed to offer real time insights into sensitive data access by > >> actively > >> > monitoring the data access on various data sets in hadoop and an > >> extensible > >> > alerting framework with a powerful policy engine. Eagle compliments > >>the > >> > existing Hadoop platform area by providing a comprehensive monitoring > >>and > >> > alerting solution for detecting sensitive data access threats based on > >> > preset policies and machine learning models for user behaviour > >>analysis. > >> > > > >> > > Known Risks > >> > > > >> > > Orphaned Products > >> > > The core developers of Eagle team work full time on this project. > >>There > >> > is no risk of Eagle getting orphaned since eBay is extensively using > >>it > >> in > >> > their production Hadoop clusters and have plans to go beyond hadoop. > >>For > >> > example, currently there are 7 hadoop clusters and 2 of them are being > >> > monitored using Hadoop Eagle in production. We have plans to extend > >>it to > >> > all hadoop clusters and eventually other data platforms. There are > >>10’s > >> of > >> > policies onboarded and actively monitored with plans to onboard more > >>use > >> > case. We are very confident that every hadoop cluster in the world > >>will > >> be > >> > monitored using Eagle for securing the hadoop ecosystem by actively > >> > monitoring for data access on sensitive data. We plan to extend and > >> > diversify this community further through Apache. We presented Eagle at > >> the > >> > hadoop summit in china and garnered interest from different companies > >>who > >> > use hadoop extensively. > >> > > > >> > > Inexperience with Open Source > >> > > The core developers are all active users and followers of open > >>source. > >> > They are already committers and contributors to the Eagle Github > >>project. > >> > All have been involved with the source code that has been released > >>under > >> an > >> > open source license, and several of them also have experience > >>developing > >> > code in an open source environment. Though the core set of Developers > >>do > >> > not have Apache Open Source experience, there are plans to onboard > >> > individuals with Apache open source experience on to the project. > >>Apache > >> > Kylin PMC members are also in the same ebay organization. We work very > >> > closely with Apache Ranger committers and are looking forward to find > >> > meaningful integrations to improve the security of hadoop platform. > >> > > > >> > > Homogenous Developers > >> > > The core developers are from eBay. Today the problem of monitoring > >>data > >> > activities to find and stop threats is a universal problem faced by > >>all > >> the > >> > businesses. Apache Incubation process encourages an open and diverse > >> > meritocratic community. Eagle intends to make every possible effort to > >> > build a diverse, vibrant and involved community and has already > >>received > >> > substantial interest from various organizations. > >> > > > >> > > Reliance on Salaried Developers > >> > > eBay invested in Eagle as the monitoring solution for Hadoop > >>clusters > >> > and some of its key engineers are working full time on the project. In > >> > addition, since there is a growing need for securing sensitive data > >> access > >> > we need a data activity monitoring solution for Hadoop, we look > >>forward > >> to > >> > other Apache developers and researchers to contribute to the project. > >> > Additional contributors, including Apache committers have plans to > >>join > >> > this effort shortly. Also key to addressing the risk associated with > >> > relying on Salaried developers from a single entity is to increase the > >> > diversity of the contributors and actively lobby for Domain experts in > >> the > >> > security space to contribute. Eagle intends to do this. > >> > > > >> > > Relationships with Other Apache Products > >> > > Eagle has a strong relationship and dependency with Apache Hadoop, > >> > HBase, Spark, Kafka and Storm. Being part of Apache’s Incubation > >> community, > >> > could help with a closer collaboration among these projects and as > >>well > >> as > >> > others. An Excessive Fascination with the Apache Brand Eagle is > >>proposing > >> > to enter incubation at Apache in order to help efforts to diversify > >>the > >> > committer-base, not so much to capitalize on the Apache brand. The > >>Eagle > >> > project is in production use already inside eBay, but is not expected > >>to > >> be > >> > an eBay product for external customers. As such, the Eagle project is > >>not > >> > seeking to use the Apache brand as a marketing tool. > >> > > > >> > > Documentation > >> > > Information about Eagle can be found at > >>https://github.com/eBay/Eagle. > >> > The following link provide more information about Eagle > >> http://goeagle.io< > >> > http://goeagle.io/>. > >> > > > >> > > Initial Source > >> > > Eagle has been under development since 2014 by a team of engineers > >>at > >> > eBay Inc. It is currently hosted on Github.com under an Apache license > >> 2.0 > >> > at https://github.com/eBay/Eagle. Once in incubation we will be > moving > >> > the code base to apache git library. > >> > > > >> > > External Dependencies > >> > > Eagle has the following external dependencies. > >> > > Basic > >> > > •JDK 1.7+ > >> > > •Scala 2.10.4 > >> > > •Apache Maven > >> > > •JUnit > >> > > •Log4j > >> > > •Slf4j > >> > > •Apache Commons > >> > > •Apache Commons Math3 > >> > > •Jackson > >> > > •Siddhi CEP engine > >> > > > >> > > Hadoop > >> > > •Apache Hadoop > >> > > •Apache HBase > >> > > •Apache Hive > >> > > •Apache Zookeeper > >> > > •Apache Curator > >> > > > >> > > Apache Spark > >> > > •Spark Core Library > >> > > > >> > > REST Service > >> > > •Jersey > >> > > > >> > > Query > >> > > •Antlr > >> > > > >> > > Stream processing > >> > > •Apache Storm > >> > > •Apache Kafka > >> > > > >> > > Web > >> > > •AngularJS > >> > > •jQuery > >> > > •Bootstrap V3 > >> > > •Moment JS > >> > > •Admin LTE > >> > > •html5shiv > >> > > •respond > >> > > •Fastclick > >> > > •Date Range Picker > >> > > •Flot JS > >> > > > >> > > Cryptography > >> > > Eagle will eventually support encryption on the wire. This is not > >>one > >> of > >> > the initial goals, and we do not expect Eagle to be a controlled > >>export > >> > item due to the use of encryption. Eagle supports but does not require > >> the > >> > Kerberos authentication mechanism to access secured Hadoop services. > >> > > > >> > > Required Resources > >> > > > >> > > Mailing List > >> > > •eagle-private for private PMC discussions > >> > > •eagle-dev for developers > >> > > •eagle-commits for all commits > >> > > •eagle-users for all eagle users > >> > > > >> > > Subversion Directory > >> > > •Git is the preferred source control system. > >> > > > >> > > Issue Tracking > >> > > •JIRA Eagle (Eagle) > >> > > > >> > > Other Resources > >> > > The existing code already has unit tests so we will make use of > >> existing > >> > Apache continuous testing infrastructure. The resulting load should > >>not > >> be > >> > very large. > >> > > > >> > > Initial Committers > >> > > •Seshu Adunuthula <sadunuthula at ebay dot com> > >> > > •Arun Manoharan <armanoharan at ebay dot com> > >> > > •Edward Zhang <yonzhang at ebay dot com> > >> > > •Hao Chen <hchen9 at ebay dot com> > >> > > •Chaitali Gupta <cgupta at ebay dot com> > >> > > •Libin Sun <libsun at ebay dot com> > >> > > •Jilin Jiang <jiljiang at ebay dot com> > >> > > •Qingwen Zhao <qingwzhao at ebay dot com> > >> > > •Hemanth Dendukuri <hdendukuri at ebay dot com> > >> > > •Senthil Kumar <senthilkumar at ebay dot com> > >> > > > >> > > > >> > > Affiliations > >> > > The initial committers are employees of eBay Inc. > >> > > > >> > > Sponsors > >> > > > >> > > Champion > >> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member > >> > > > >> > > Nominated Mentors > >> > > •Owen O’Malley < omalley at apache dot org > - Apache IPMC member, > >> > Hortonworks > >> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member > >> > > •Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member, > >> > Hortonworks > >> > > •Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache > >>IPMC > >> > member > >> > > •Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member, > >> > Hortonworks > >> > > > >> > > Sponsoring Entity > >> > > We are requesting the Incubator to sponsor this project. > >> > > > >> > > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> > For additional commands, e-mail: general-h...@incubator.apache.org > >> > > >> > > >> > >> > >> -- > >> Regards, > >> > >> *Bin Mahone | 马洪宾* > >> Apache Kylin: http://kylin.io > >> Github: https://github.com/binmahone > >