On Tue, Oct 20, 2015 at 10:51 AM, Manoharan, Arun <armanoha...@ebay.com> wrote: > Hi Greg, > > Thank you for reviewing the proposal. > > Originally we thought Eagle might be trademarked by someone already but I > went thru eBay legal team to get the clearance for the name to be used. We > will look into it again to see if there will be potential problems.
Ultimately it will be the ASF that determines the appropriateness of the name for a podling. A few pointers: http://incubator.apache.org/guides/names.html https://issues.apache.org/jira/browse/PODLINGNAMESEARCH/ > Thanks, > Arun - Sam Ruby > On 10/20/15, 1:52 AM, "Greg Stein" <gst...@gmail.com> wrote: > >>Hey there, Arun! ... I have no commentary on the proposal itself, as it >>looks like a great proposal. I would suggest being a bit wary of the name, >>as "Eagle" is a *very* popular PCB design program. >> >>On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun <armanoha...@ebay.com> >>wrote: >> >>> Hello Everyone, >>> >>> My name is Arun Manoharan. Currently a product manager in the Analytics >>> platform team at eBay Inc. >>> >>> I would like to start a discussion on Eagle and its joining the ASF as >>>an >>> incubation project. >>> >>> Eagle is a Monitoring solution for Hadoop to instantly identify access >>>to >>> sensitive data, recognize attacks, malicious activities and take >>>actions in >>> real time. Eagle supports a wide variety of policies on HDFS data and >>>Hive. >>> Eagle also provides machine learning models for detecting anomalous user >>> behavior in Hadoop. >>> >>> The proposal is available on the wiki here: >>> https://wiki.apache.org/incubator/EagleProposal >>> >>> The text of the proposal is also available at the end of this email. >>> >>> Thanks for your time and help. >>> >>> Thanks, >>> Arun >>> >>> <COPY of the proposal in text format> >>> >>> Eagle >>> >>> Abstract >>> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>> identify access to sensitive data, recognize attacks, malicious >>>activities >>> in hadoop and take actions. >>> >>> Proposal >>> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>> enforces policies defined on sensitive data access and alerts or blocks >>> user¹s access to that sensitive data in real time. Eagle also creates >>>user >>> profiles based on the typical access behaviour for HDFS and Hive and >>>sends >>> alerts when anomalous behaviour is detected. Eagle can also import >>> sensitive data information classified by external classification >>>engines to >>> help define its policies. >>> >>> Overview of Eagle >>> Eagle has 3 main parts. >>> 1.Data collection and storage - Eagle collects data from various hadoop >>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>>storage. >>> 2.Data processing and policy engine - Eagle allows users to create >>> policies based on various metadata properties on HDFS, Hive and HBase >>>data. >>> 3.Eagle services - Eagle services include policy manager, query service >>> and the visualization component. Eagle provides intuitive user >>>interface to >>> administer Eagle and an alert dashboard to respond to real time alerts. >>> >>> Data Collection and Storage: >>> Eagle provides programming API for extending Eagle to integrate any data >>> source into Eagle policy evaluation framework. For example, Eagle hdfs >>> audit monitoring collects data from Kafka which is populated from >>>namenode >>> log4j appender or from logstash agent. Eagle hive monitoring collects >>>hive >>> query logs from running job through YARN API, which is designed to be >>> scalable and fault-tolerant. Eagle uses HBase as storage for storing >>> metadata and metrics data, and also supports relational database through >>> configuration change. >>> >>> Data Processing and Policy Engine: >>> Processing Engine: Eagle provides stream processing API which is an >>> abstraction of Apache Storm. It can also be extended to other streaming >>> engines. This abstraction allows developers to assemble data >>> transformation, filtering, external data join etc. without physically >>>bound >>> to a specific streaming platform. Eagle streaming API allows developers >>>to >>> easily integrate business logic with Eagle policy engine and internally >>> Eagle framework compiles business logic execution DAG into program >>> primitives of underlying stream infrastructure e.g. Apache Storm. For >>> example, Eagle HDFS monitoring transforms audit log from Namenode to >>>object >>> and joins sensitivity metadata, security zone metadata which are >>>generated >>> from external programs or configured by user. Eagle hive monitoring >>>filters >>> running jobs to get hive query string and parses query string into >>>object >>> and then joins sensitivity metadata. >>> Alerting Framework: Eagle Alert Framework includes stream metadata API, >>> scalable policy engine framework, extensible policy engine framework. >>> Stream metadata API allows developers to declare event schema including >>> what attributes constitute an event, what is the type for each >>>attribute, >>> and how to dynamically resolve attribute value in runtime when user >>> configures policy. Scalable policy engine framework allows policies to >>>be >>> executed on different physical nodes in parallel. It is also used to >>>define >>> your own policy partitioner class. Policy engine framework together with >>> streaming partitioning capability provided by all streaming platforms >>>will >>> make sure policies and events can be evaluated in a fully distributed >>>way. >>> Extensible policy engine framework allows developer to plugin a new >>>policy >>> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy >>> engine which Eagle supports as first-class citizen. >>> Machine Learning module: Eagle provides capabilities to define user >>> activity patterns or user profiles for Hadoop users based on the user >>> behaviour in the platform. These user profiles are modeled using Machine >>> Learning algorithms and used for detection of anomalous users >>>activities. >>> Eagle uses Eigen Value Decomposition, and Density Estimation algorithms >>>for >>> generating user profile models. The model reads data from HDFS audit >>>logs, >>> preprocesses and aggregates data, and generates models using Spark >>> programming APIs. Once models are generated, Eagle uses stream >>>processing >>> engine for near real-time anomaly detection to determine if any user¹s >>> activities are suspicious or not. >>> >>> Eagle Services: >>> Query Service: Eagle provides SQL-like service API to support >>> comprehensive computation for huge set of data on the fly, for e.g. >>> comprehensive filtering, aggregation, histogram, sorting, top, >>>arithmetical >>> expression, pagination etc. HBase is the data storage which Eagle >>>supports >>> as first-class citizen, relational database is supported as well. For >>>HBase >>> storage, Eagle query framework compiles user provided SQL-like query >>>into >>> HBase native filter objects and execute it through HBase coprocessor on >>>the >>> fly. >>> Policy Manager: Eagle policy manager provides UI and Restful API for >>>user >>> to define policy with just a few clicks. It includes site management UI, >>> policy editor, sensitivity metadata import, HDFS or Hive sensitive >>>resource >>> browsing, alert dashboards etc. >>> Background >>> Data is one of the most important assets for today¹s businesses, which >>> makes data security one of the top priorities of today¹s enterprises. >>> Hadoop is widely used across different verticals as a big data >>>repository >>> to store this data in most modern enterprises. >>> At eBay we use hadoop platform extensively for our data processing >>>needs. >>> Our data in Hadoop is becoming bigger and bigger as our user base is >>>seeing >>> an exponential growth. Today there are variety of data sets available in >>> Hadoop cluster for our users to consume. eBay has around 120 PB of data >>> stored in HDFS across 6 different clusters and around 1800+ active >>>hadoop >>> users consuming data thru Hive, HBase and mapreduce jobs everyday to >>>build >>> applications using this data. With this astronomical growth of data >>>there >>> are also challenges in securing sensitive data and monitoring the >>>access to >>> this sensitive data. Today in large organizations HDFS is the defacto >>> standard for storing big data. Data sets which includes and not limited >>>to >>> consumer sentiment, social media data, customer segmentation, web >>>clicks, >>> sensor data, geo-location and transaction data get stored in Hadoop for >>>day >>> to day business needs. >>> We at eBay want to make sure the sensitive data and data platforms are >>> completely protected from security breaches. So we partnered very >>>closely >>> with our Information Security team to understand the requirements for >>>Eagle >>> to monitor sensitive data access on hadoop: >>> 1.Ability to identify and stop security threats in real time >>> 2.Scale for big data (Support PB scale and Billions of events) >>> 3.Ability to create data access policies >>> 4.Support multiple data sources like HDFS, HBase, Hive >>> 5.Visualize alerts in real time >>> 6.Ability to block malicious access in real time >>> We did not find any data access monitoring solution that available today >>> and can provide the features and functionality that we need to monitor >>>the >>> data access in the hadoop ecosystem at our scale. Hence with an >>>excellent >>> team of world class developers and several users, we have been able to >>> bring Eagle into production as well as open source it. >>> >>> Rationale >>> In today¹s world; data is an important asset for any company. Businesses >>> are using data extensively to create amazing experiences for users. Data >>> has to be protected and access to data should be secured from security >>> breaches. Today Hadoop is not only used to store logs but also stores >>> financial data, sensitive data sets, geographical data, user click >>>stream >>> data sets etc. which makes it more important to be protected from >>>security >>> breaches. To secure a data platform there are multiple things that need >>>to >>> happen. One is having a strong access control mechanism which today is >>> provided by Apache Ranger and Apache Sentry. These tools provide the >>> ability to provide fine grain access control mechanism to data sets on >>> hadoop. But there is a big gap in terms of monitoring all the data >>>access >>> events and activities in order to securing the hadoop data platform. >>> Together with strong access control, perimeter security and data access >>> monitoring in place data in the hadoop clusters can be secured against >>> breaches. We looked around and found following: >>> Existing data activity monitoring products are designed for traditional >>> databases and data warehouse. Existing monitoring platforms cannot scale >>> out to support fast growing data and petabyte scale. Few products in the >>> industry are still very early in terms of supporting HDFS, Hive, HBase >>>data >>> access monitoring. >>> As mentioned in the background, the business requirement and urgency to >>> secure the data from users with malicious intent drove eBay to invest in >>> building a real time data access monitoring solution from scratch to >>>offer >>> real time alerts and remediation features for malicious data access. >>> With the power of open source distributed systems like Hadoop, Kafka and >>> much more we were able to develop a data activity monitoring system that >>> can scale, identify and stop malicious access in real time. >>> Eagle allows admins to create standard access policies and rules for >>> monitoring HDFS, Hive and HBase data. Eagle also provides out of box >>> machine learning models for modeling user profiles based on user access >>> behaviour and use the model to alert on anomalies. >>> >>> Current Status >>> >>> Meritocracy >>> Eagle has been deployed in production at eBay for monitoring billions of >>> events per day from HDFS and Hive operations. From the start; the >>>product >>> has been built with focus on high scalability and application >>>extensibility >>> in mind and Eagle has demonstrated great performance in responding to >>> suspicious events instantly and great flexibility in defining policy. >>> >>> Community >>> Eagle seeks to develop the developer and user communities during >>> incubation. >>> >>> Core Developers >>> Eagle is currently being designed and developed by engineers from eBay >>> Inc. Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang, >>> Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of >>> these core developers have deep expertise in developing monitoring >>>products >>> for the Hadoop ecosystem. >>> >>> Alignment >>> The ASF is a natural host for Eagle given that it is already the home of >>> Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data >>> projects. Eagle leverages lot of Apache open-source products. Eagle was >>> designed to offer real time insights into sensitive data access by >>>actively >>> monitoring the data access on various data sets in hadoop and an >>>extensible >>> alerting framework with a powerful policy engine. Eagle compliments the >>> existing Hadoop platform area by providing a comprehensive monitoring >>>and >>> alerting solution for detecting sensitive data access threats based on >>> preset policies and machine learning models for user behaviour analysis. >>> >>> Known Risks >>> >>> Orphaned Products >>> The core developers of Eagle team work full time on this project. There >>>is >>> no risk of Eagle getting orphaned since eBay is extensively using it in >>> their production Hadoop clusters and have plans to go beyond hadoop. For >>> example, currently there are 7 hadoop clusters and 2 of them are being >>> monitored using Hadoop Eagle in production. We have plans to extend it >>>to >>> all hadoop clusters and eventually other data platforms. There are 10¹s >>>of >>> policies onboarded and actively monitored with plans to onboard more use >>> case. We are very confident that every hadoop cluster in the world will >>>be >>> monitored using Eagle for securing the hadoop ecosystem by actively >>> monitoring for data access on sensitive data. We plan to extend and >>> diversify this community further through Apache. We presented Eagle at >>>the >>> hadoop summit in china and garnered interest from different companies >>>who >>> use hadoop extensively. >>> >>> Inexperience with Open Source >>> The core developers are all active users and followers of open source. >>> They are already committers and contributors to the Eagle Github >>>project. >>> All have been involved with the source code that has been released >>>under an >>> open source license, and several of them also have experience developing >>> code in an open source environment. Though the core set of Developers do >>> not have Apache Open Source experience, there are plans to onboard >>> individuals with Apache open source experience on to the project. Apache >>> Kylin PMC members are also in the same ebay organization. We work very >>> closely with Apache Ranger committers and are looking forward to find >>> meaningful integrations to improve the security of hadoop platform. >>> >>> Homogenous Developers >>> The core developers are from eBay. Today the problem of monitoring data >>> activities to find and stop threats is a universal problem faced by all >>>the >>> businesses. Apache Incubation process encourages an open and diverse >>> meritocratic community. Eagle intends to make every possible effort to >>> build a diverse, vibrant and involved community and has already received >>> substantial interest from various organizations. >>> >>> Reliance on Salaried Developers >>> eBay invested in Eagle as the monitoring solution for Hadoop clusters >>>and >>> some of its key engineers are working full time on the project. In >>> addition, since there is a growing need for securing sensitive data >>>access >>> we need a data activity monitoring solution for Hadoop, we look forward >>>to >>> other Apache developers and researchers to contribute to the project. >>> Additional contributors, including Apache committers have plans to join >>> this effort shortly. Also key to addressing the risk associated with >>> relying on Salaried developers from a single entity is to increase the >>> diversity of the contributors and actively lobby for Domain experts in >>>the >>> security space to contribute. Eagle intends to do this. >>> >>> Relationships with Other Apache Products >>> Eagle has a strong relationship and dependency with Apache Hadoop, >>>HBase, >>> Spark, Kafka and Storm. Being part of Apache¹s Incubation community, >>>could >>> help with a closer collaboration among these projects and as well as >>> others. An Excessive Fascination with the Apache Brand Eagle is >>>proposing >>> to enter incubation at Apache in order to help efforts to diversify the >>> committer-base, not so much to capitalize on the Apache brand. The Eagle >>> project is in production use already inside eBay, but is not expected >>>to be >>> an eBay product for external customers. As such, the Eagle project is >>>not >>> seeking to use the Apache brand as a marketing tool. >>> >>> Documentation >>> Information about Eagle can be found at https://github.com/eBay/Eagle. >>> The following link provide more information about Eagle >>>http://goeagle.io. >>> >>> Initial Source >>> Eagle has been under development since 2014 by a team of engineers at >>>eBay >>> Inc. It is currently hosted on Github.com under an Apache license 2.0 at >>> https://github.com/eBay/Eagle. Once in incubation we will be moving the >>> code base to apache git library. >>> >>> External Dependencies >>> Eagle has the following external dependencies. >>> Basic >>> €JDK 1.7+ >>> €Scala 2.10.4 >>> €Apache Maven >>> €JUnit >>> €Log4j >>> €Slf4j >>> €Apache Commons >>> €Apache Commons Math3 >>> €Jackson >>> €Siddhi CEP engine >>> >>> Hadoop >>> €Apache Hadoop >>> €Apache HBase >>> €Apache Hive >>> €Apache Zookeeper >>> €Apache Curator >>> >>> Apache Spark >>> €Spark Core Library >>> >>> REST Service >>> €Jersey >>> >>> Query >>> €Antlr >>> >>> Stream processing >>> €Apache Storm >>> €Apache Kafka >>> >>> Web >>> €AngularJS >>> €jQuery >>> €Bootstrap V3 >>> €Moment JS >>> €Admin LTE >>> €html5shiv >>> €respond >>> €Fastclick >>> €Date Range Picker >>> €Flot JS >>> >>> Cryptography >>> Eagle will eventually support encryption on the wire. This is not one of >>> the initial goals, and we do not expect Eagle to be a controlled export >>> item due to the use of encryption. Eagle supports but does not require >>>the >>> Kerberos authentication mechanism to access secured Hadoop services. >>> >>> Required Resources >>> >>> Mailing List >>> €eagle-private for private PMC discussions >>> €eagle-dev for developers >>> €eagle-commits for all commits >>> €eagle-users for all eagle users >>> >>> Subversion Directory >>> €Git is the preferred source control system. >>> >>> Issue Tracking >>> €JIRA Eagle (Eagle) >>> >>> Other Resources >>> The existing code already has unit tests so we will make use of existing >>> Apache continuous testing infrastructure. The resulting load should not >>>be >>> very large. >>> >>> Initial Committers >>> €Seshu Adunuthula <sadunuthula at ebay dot com> >>> €Arun Manoharan <armanoharan at ebay dot com> >>> €Edward Zhang <yonzhang at ebay dot com> >>> €Hao Chen <hchen9 at ebay dot com> >>> €Chaitali Gupta <cgupta at ebay dot com> >>> €Libin Sun <libsun at ebay dot com> >>> €Jilin Jiang <jiljiang at ebay dot com> >>> €Qingwen Zhao <qingwzhao at ebay dot com> >>> €Hemanth Dendukuri <hdendukuri at ebay dot com> >>> €Senthil Kumar <senthilkumar at ebay dot com> >>> €Tan Chen <tanchen at ebay dot com> >>> >>> Affiliations >>> The initial committers are employees of eBay Inc. >>> >>> Sponsors >>> >>> Champion >>> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >>> >>> Nominated Mentors >>> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member, >>> Hortonworks >>> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >>> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member, >>> Hortonworks >>> >>> Sponsoring Entity >>> We are requesting the Incubator to sponsor this project. >>> >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org