+1
-Medha

On 10/23/15, 1:14 PM, "Balaji Ganesan" <bgane...@apache.org> wrote:

>+1
>
>On Fri, Oct 23, 2015 at 12:26 PM, Chris Nauroth <cnaur...@hortonworks.com>
>wrote:
>
>> +1 (binding)
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 10/23/15, 7:11 AM, "Manoharan, Arun" <armanoha...@ebay.com> wrote:
>>
>> >Hello Everyone,
>> >
>> >Thanks for all the feedback on the Eagle Proposal.
>> >
>> >I would like to call for a [VOTE] on Eagle joining the ASF as an
>> >incubation project.
>> >
>> >The vote is open for 72 hours:
>> >
>> >[ ] +1 accept Eagle in the Incubator
>> >[ ] ±0
>> >[ ] -1 (please give reason)
>> >
>> >Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to
>> >sensitive data, recognize attacks, malicious activities and take
>>actions
>> >in real time. Eagle supports a wide variety of policies on HDFS data
>>and
>> >Hive. Eagle also provides machine learning models for detecting
>>anomalous
>> >user behavior in Hadoop.
>> >
>> >The proposal is available on the wiki here:
>> >https://wiki.apache.org/incubator/EagleProposal
>> >
>> >The text of the proposal is also available at the end of this email.
>> >
>> >Thanks for your time and help.
>> >
>> >Thanks,
>> >Arun
>> >
>> ><COPY of the proposal in text format>
>> >
>> >Eagle
>> >
>> >Abstract
>> >Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> >identify access to sensitive data, recognize attacks, malicious
>> >activities in hadoop and take actions.
>> >
>> >Proposal
>> >Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> >enforces policies defined on sensitive data access and alerts or blocks
>> >user¹s access to that sensitive data in real time. Eagle also creates
>> >user profiles based on the typical access behaviour for HDFS and Hive
>>and
>> >sends alerts when anomalous behaviour is detected. Eagle can also
>>import
>> >sensitive data information classified by external classification
>>engines
>> >to help define its policies.
>> >
>> >Overview of Eagle
>> >Eagle has 3 main parts.
>> >1.Data collection and storage - Eagle collects data from various hadoop
>> >logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>> >storage.
>> >2.Data processing and policy engine - Eagle allows users to create
>> >policies based on various metadata properties on HDFS, Hive and HBase
>> >data.
>> >3.Eagle services - Eagle services include policy manager, query service
>> >and the visualization component. Eagle provides intuitive user
>>interface
>> >to administer Eagle and an alert dashboard to respond to real time
>>alerts.
>> >
>> >Data Collection and Storage:
>> >Eagle provides programming API for extending Eagle to integrate any
>>data
>> >source into Eagle policy evaluation framework. For example, Eagle hdfs
>> >audit monitoring collects data from Kafka which is populated from
>> >namenode log4j appender or from logstash agent. Eagle hive monitoring
>> >collects hive query logs from running job through YARN API, which is
>> >designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>> >for storing metadata and metrics data, and also supports relational
>> >database through configuration change.
>> >
>> >Data Processing and Policy Engine:
>> >Processing Engine: Eagle provides stream processing API which is an
>> >abstraction of Apache Storm. It can also be extended to other streaming
>> >engines. This abstraction allows developers to assemble data
>> >transformation, filtering, external data join etc. without physically
>> >bound to a specific streaming platform. Eagle streaming API allows
>> >developers to easily integrate business logic with Eagle policy engine
>> >and internally Eagle framework compiles business logic execution DAG
>>into
>> >program primitives of underlying stream infrastructure e.g. Apache
>>Storm.
>> >For example, Eagle HDFS monitoring transforms audit log from Namenode
>>to
>> >object and joins sensitivity metadata, security zone metadata which are
>> >generated from external programs or configured by user. Eagle hive
>> >monitoring filters running jobs to get hive query string and parses
>>query
>> >string into object and then joins sensitivity metadata.
>> >Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> >scalable policy engine framework, extensible policy engine framework.
>> >Stream metadata API allows developers to declare event schema including
>> >what attributes constitute an event, what is the type for each
>>attribute,
>> >and how to dynamically resolve attribute value in runtime when user
>> >configures policy. Scalable policy engine framework allows policies to
>>be
>> >executed on different physical nodes in parallel. It is also used to
>> >define your own policy partitioner class. Policy engine framework
>> >together with streaming partitioning capability provided by all
>>streaming
>> >platforms will make sure policies and events can be evaluated in a
>>fully
>> >distributed way. Extensible policy engine framework allows developer to
>> >plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP
>> >engine is the policy engine which Eagle supports as first-class
>>citizen.
>> >Machine Learning module: Eagle provides capabilities to define user
>> >activity patterns or user profiles for Hadoop users based on the user
>> >behaviour in the platform. These user profiles are modeled using
>>Machine
>> >Learning algorithms and used for detection of anomalous users
>>activities.
>> >Eagle uses Eigen Value Decomposition, and Density Estimation algorithms
>> >for generating user profile models. The model reads data from HDFS
>>audit
>> >logs, preprocesses and aggregates data, and generates models using
>>Spark
>> >programming APIs. Once models are generated, Eagle uses stream
>>processing
>> >engine for near real-time anomaly detection to determine if any user¹s
>> >activities are suspicious or not.
>> >
>> >Eagle Services:
>> >Query Service: Eagle provides SQL-like service API to support
>> >comprehensive computation for huge set of data on the fly, for e.g.
>> >comprehensive filtering, aggregation, histogram, sorting, top,
>> >arithmetical expression, pagination etc. HBase is the data storage
>>which
>> >Eagle supports as first-class citizen, relational database is supported
>> >as well. For HBase storage, Eagle query framework compiles user
>>provided
>> >SQL-like query into HBase native filter objects and execute it through
>> >HBase coprocessor on the fly.
>> >Policy Manager: Eagle policy manager provides UI and Restful API for
>>user
>> >to define policy with just a few clicks. It includes site management
>>UI,
>> >policy editor, sensitivity metadata import, HDFS or Hive sensitive
>> >resource browsing, alert dashboards etc.
>> >Background
>> >Data is one of the most important assets for today¹s businesses, which
>> >makes data security one of the top priorities of today¹s enterprises.
>> >Hadoop is widely used across different verticals as a big data
>>repository
>> >to store this data in most modern enterprises.
>> >At eBay we use hadoop platform extensively for our data processing
>>needs.
>> >Our data in Hadoop is becoming bigger and bigger as our user base is
>> >seeing an exponential growth. Today there are variety of data sets
>> >available in Hadoop cluster for our users to consume. eBay has around
>>120
>> >PB of data stored in HDFS across 6 different clusters and around 1800+
>> >active hadoop users consuming data thru Hive, HBase and mapreduce jobs
>> >everyday to build applications using this data. With this astronomical
>> >growth of data there are also challenges in securing sensitive data and
>> >monitoring the access to this sensitive data. Today in large
>> >organizations HDFS is the defacto standard for storing big data. Data
>> >sets which includes and not limited to consumer sentiment, social media
>> >data, customer segmentation, web clicks, sensor data, geo-location and
>> >transaction data get stored in Hadoop for day to day business needs.
>> >We at eBay want to make sure the sensitive data and data platforms are
>> >completely protected from security breaches. So we partnered very
>>closely
>> >with our Information Security team to understand the requirements for
>> >Eagle to monitor sensitive data access on hadoop:
>> >1.Ability to identify and stop security threats in real time
>> >2.Scale for big data (Support PB scale and Billions of events)
>> >3.Ability to create data access policies
>> >4.Support multiple data sources like HDFS, HBase, Hive
>> >5.Visualize alerts in real time
>> >6.Ability to block malicious access in real time
>> >We did not find any data access monitoring solution that available
>>today
>> >and can provide the features and functionality that we need to monitor
>> >the data access in the hadoop ecosystem at our scale. Hence with an
>> >excellent team of world class developers and several users, we have
>>been
>> >able to bring Eagle into production as well as open source it.
>> >
>> >Rationale
>> >In today¹s world; data is an important asset for any company.
>>Businesses
>> >are using data extensively to create amazing experiences for users.
>>Data
>> >has to be protected and access to data should be secured from security
>> >breaches. Today Hadoop is not only used to store logs but also stores
>> >financial data, sensitive data sets, geographical data, user click
>>stream
>> >data sets etc. which makes it more important to be protected from
>> >security breaches. To secure a data platform there are multiple things
>> >that need to happen. One is having a strong access control mechanism
>> >which today is provided by Apache Ranger and Apache Sentry. These tools
>> >provide the ability to provide fine grain access control mechanism to
>> >data sets on hadoop. But there is a big gap in terms of monitoring all
>> >the data access events and activities in order to securing the hadoop
>> >data platform. Together with strong access control, perimeter security
>> >and data access monitoring in place data in the hadoop clusters can be
>> >secured against breaches. We looked around and found following:
>> >Existing data activity monitoring products are designed for traditional
>> >databases and data warehouse. Existing monitoring platforms cannot
>>scale
>> >out to support fast growing data and petabyte scale. Few products in
>>the
>> >industry are still very early in terms of supporting HDFS, Hive, HBase
>> >data access monitoring.
>> >As mentioned in the background, the business requirement and urgency to
>> >secure the data from users with malicious intent drove eBay to invest
>>in
>> >building a real time data access monitoring solution from scratch to
>> >offer real time alerts and remediation features for malicious data
>>access.
>> >With the power of open source distributed systems like Hadoop, Kafka
>>and
>> >much more we were able to develop a data activity monitoring system
>>that
>> >can scale, identify and stop malicious access in real time.
>> >Eagle allows admins to create standard access policies and rules for
>> >monitoring HDFS, Hive and HBase data. Eagle also provides out of box
>> >machine learning models for modeling user profiles based on user access
>> >behaviour and use the model to alert on anomalies.
>> >
>> >Current Status
>> >
>> >Meritocracy
>> >Eagle has been deployed in production at eBay for monitoring billions
>>of
>> >events per day from HDFS and Hive operations. From the start; the
>>product
>> >has been built with focus on high scalability and application
>> >extensibility in mind and Eagle has demonstrated great performance in
>> >responding to suspicious events instantly and great flexibility in
>> >defining policy.
>> >
>> >Community
>> >Eagle seeks to develop the developer and user communities during
>> >incubation.
>> >
>> >Core Developers
>> >Eagle is currently being designed and developed by engineers from eBay
>> >Inc. ­ Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
>> >Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
>> >these core developers have deep expertise in developing monitoring
>> >products for the Hadoop ecosystem.
>> >
>> >Alignment
>> >The ASF is a natural host for Eagle given that it is already the home
>>of
>> >Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
>> >projects. Eagle leverages lot of Apache open-source products. Eagle was
>> >designed to offer real time insights into sensitive data access by
>> >actively monitoring the data access on various data sets in hadoop and
>>an
>> >extensible alerting framework with a powerful policy engine. Eagle
>> >compliments the existing Hadoop platform area by providing a
>> >comprehensive monitoring and alerting solution for detecting sensitive
>> >data access threats based on preset policies and machine learning
>>models
>> >for user behaviour analysis.
>> >
>> >Known Risks
>> >
>> >Orphaned Products
>> >The core developers of Eagle team work full time on this project. There
>> >is no risk of Eagle getting orphaned since eBay is extensively using it
>> >in their production Hadoop clusters and have plans to go beyond hadoop.
>> >For example, currently there are 7 hadoop clusters and 2 of them are
>> >being monitored using Hadoop Eagle in production. We have plans to
>>extend
>> >it to all hadoop clusters and eventually other data platforms. There
>>are
>> >10¹s of policies onboarded and actively monitored with plans to onboard
>> >more use case. We are very confident that every hadoop cluster in the
>> >world will be monitored using Eagle for securing the hadoop ecosystem
>>by
>> >actively monitoring for data access on sensitive data. We plan to
>>extend
>> >and diversify this community further through Apache. We presented Eagle
>> >at the hadoop summit in china and garnered interest from different
>> >companies who use hadoop extensively.
>> >
>> >Inexperience with Open Source
>> >The core developers are all active users and followers of open source.
>> >They are already committers and contributors to the Eagle Github
>>project.
>> >All have been involved with the source code that has been released
>>under
>> >an open source license, and several of them also have experience
>> >developing code in an open source environment. Though the core set of
>> >Developers do not have Apache Open Source experience, there are plans
>>to
>> >onboard individuals with Apache open source experience on to the
>>project.
>> >Apache Kylin PMC members are also in the same ebay organization. We
>>work
>> >very closely with Apache Ranger committers and are looking forward to
>> >find meaningful integrations to improve the security of hadoop
>>platform.
>> >
>> >Homogenous Developers
>> >The core developers are from eBay. Today the problem of monitoring data
>> >activities to find and stop threats is a universal problem faced by all
>> >the businesses. Apache Incubation process encourages an open and
>>diverse
>> >meritocratic community. Eagle intends to make every possible effort to
>> >build a diverse, vibrant and involved community and has already
>>received
>> >substantial interest from various organizations.
>> >
>> >Reliance on Salaried Developers
>> >eBay invested in Eagle as the monitoring solution for Hadoop clusters
>>and
>> >some of its key engineers are working full time on the project. In
>> >addition, since there is a growing need for securing sensitive data
>> >access we need a data activity monitoring solution for Hadoop, we look
>> >forward to other Apache developers and researchers to contribute to the
>> >project. Additional contributors, including Apache committers have
>>plans
>> >to join this effort shortly. Also key to addressing the risk associated
>> >with relying on Salaried developers from a single entity is to increase
>> >the diversity of the contributors and actively lobby for Domain experts
>> >in the security space to contribute. Eagle intends to do this.
>> >
>> >Relationships with Other Apache Products
>> >Eagle has a strong relationship and dependency with Apache Hadoop,
>>HBase,
>> >Spark, Kafka and Storm. Being part of Apache¹s Incubation community,
>> >could help with a closer collaboration among these projects and as well
>> >as others. An Excessive Fascination with the Apache Brand Eagle is
>> >proposing to enter incubation at Apache in order to help efforts to
>> >diversify the committer-base, not so much to capitalize on the Apache
>> >brand. The Eagle project is in production use already inside eBay, but
>>is
>> >not expected to be an eBay product for external customers. As such, the
>> >Eagle project is not seeking to use the Apache brand as a marketing
>>tool.
>> >
>> >Documentation
>> >Information about Eagle can be found at https://github.com/eBay/Eagle.
>> >The following link provide more information about Eagle
>> >http://goeagle.io<http://goeagle.io/>.
>> >
>> >Initial Source
>> >Eagle has been under development since 2014 by a team of engineers at
>> >eBay Inc. It is currently hosted on Github.com under an Apache license
>> >2.0 at https://github.com/eBay/Eagle. Once in incubation we will be
>> >moving the code base to apache git library.
>> >
>> >External Dependencies
>> >Eagle has the following external dependencies.
>> >Basic
>> >€JDK 1.7+
>> >€Scala 2.10.4
>> >€Apache Maven
>> >€JUnit
>> >€Log4j
>> >€Slf4j
>> >€Apache Commons
>> >€Apache Commons Math3
>> >€Jackson
>> >€Siddhi CEP engine
>> >
>> >Hadoop
>> >€Apache Hadoop
>> >€Apache HBase
>> >€Apache Hive
>> >€Apache Zookeeper
>> >€Apache Curator
>> >
>> >Apache Spark
>> >€Spark Core Library
>> >
>> >REST Service
>> >€Jersey
>> >
>> >Query
>> >€Antlr
>> >
>> >Stream processing
>> >€Apache Storm
>> >€Apache Kafka
>> >
>> >Web
>> >€AngularJS
>> >€jQuery
>> >€Bootstrap V3
>> >€Moment JS
>> >€Admin LTE
>> >€html5shiv
>> >€respond
>> >€Fastclick
>> >€Date Range Picker
>> >€Flot JS
>> >
>> >Cryptography
>> >Eagle will eventually support encryption on the wire. This is not one
>>of
>> >the initial goals, and we do not expect Eagle to be a controlled export
>> >item due to the use of encryption. Eagle supports but does not require
>> >the Kerberos authentication mechanism to access secured Hadoop
>>services.
>> >
>> >Required Resources
>> >
>> >Mailing List
>> >€eagle-private for private PMC discussions
>> >€eagle-dev for developers
>> >€eagle-commits for all commits
>> >€eagle-users for all eagle users
>> >
>> >Subversion Directory
>> >€Git is the preferred source control system.
>> >
>> >Issue Tracking
>> >€JIRA Eagle (Eagle)
>> >
>> >Other Resources
>> >The existing code already has unit tests so we will make use of
>>existing
>> >Apache continuous testing infrastructure. The resulting load should not
>> >be very large.
>> >
>> >Initial Committers
>> >€Seshu Adunuthula <sadunuthula at ebay dot com>
>> >€Arun Manoharan <armanoharan at ebay dot com>
>> >€Edward Zhang <yonzhang at ebay dot com>
>> >€Hao Chen <hchen9 at ebay dot com>
>> >€Chaitali Gupta <cgupta at ebay dot com>
>> >€Libin Sun <libsun at ebay dot com>
>> >€Jilin Jiang <jiljiang at ebay dot com>
>> >€Qingwen Zhao <qingwzhao at ebay dot com>
>> >€Hemanth Dendukuri <hdendukuri at ebay dot com>
>> >€Senthil Kumar <senthilkumar at ebay dot com>
>> >
>> >
>> >Affiliations
>> >The initial committers are employees of eBay Inc.
>> >
>> >Sponsors
>> >
>> >Champion
>> >€Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> >
>> >Nominated Mentors
>> >€Owen O¹Malley < omalley at apache dot org > - Apache IPMC member,
>> >Hortonworks
>> >€Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> >€Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
>> >Hortonworks
>> >€Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache IPMC
>> >member
>> >€Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member,
>> >Hortonworks
>> >
>> >Sponsoring Entity
>> >We are requesting the Incubator to sponsor this project.
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>

Reply via email to