Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread Luke Han
Thanks Henry, I'm also happy to help Eagle for incubating, the experience
we have
learnt during Kylin's incubating will share to Eagle team and community.

Sign off reports and vote for release is not only things mentors will help,
for the process, setup, policy, infrastructure, guidance, workflow, the
Apache Way and
so on...a lots of things require mentor's efforts, several active mentors
are really important
for new podling committers to learn and practices in their daily work,

We have got help from our mentors very much, hope such experience from
Kylin will also
benefits Eagle even other project.


Thanks.





Best Regards!
-

Luke Han

On Thu, Oct 22, 2015 at 1:16 PM, Greg Stein  wrote:

> On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley 
> wrote:
>
> > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning 
> > wrote:
> >
> > > I would suggest that Owen O'Malley has not had enough time to be a
> viable
> > > mentor recently and should not be on the list of mentors.
> > >
> >
> > I have been helping Kylin out and it is graduating, so I'm down to just
> > Hawq. I'd like to help Eagle out.
> >
>
> No need to be on the official list of mentors ... just help out anyways ...
> heck, then you don't have to be responsible for signing off reports ;-)
>


Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread larry mccay
Arun -

This is a very interesting proposal and I can imagine at least a couple
ways that it and Apache Knox could work together.
I would like to be a contributor to Eagle as well - if you are interested.

I am a committer and PMC member on Knox and Ranger and have contributed to
a number of ecosystem projects on security issues.

Good luck!

--larry

On Thu, Oct 22, 2015 at 4:27 AM, Luke Han  wrote:

> Thanks Henry, I'm also happy to help Eagle for incubating, the experience
> we have
> learnt during Kylin's incubating will share to Eagle team and community.
>
> Sign off reports and vote for release is not only things mentors will help,
> for the process, setup, policy, infrastructure, guidance, workflow, the
> Apache Way and
> so on...a lots of things require mentor's efforts, several active mentors
> are really important
> for new podling committers to learn and practices in their daily work,
>
> We have got help from our mentors very much, hope such experience from
> Kylin will also
> benefits Eagle even other project.
>
>
> Thanks.
>
>
>
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Thu, Oct 22, 2015 at 1:16 PM, Greg Stein  wrote:
>
> > On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley 
> > wrote:
> >
> > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning 
> > > wrote:
> > >
> > > > I would suggest that Owen O'Malley has not had enough time to be a
> > viable
> > > > mentor recently and should not be on the list of mentors.
> > > >
> > >
> > > I have been helping Kylin out and it is graduating, so I'm down to just
> > > Hawq. I'd like to help Eagle out.
> > >
> >
> > No need to be on the official list of mentors ... just help out anyways
> ...
> > heck, then you don't have to be responsible for signing off reports ;-)
> >
>


Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread Manoharan, Arun
Thanks Larry. We have some ideas for integration with Ranger and would
love to get your inputs.


On 10/22/15, 4:55 AM, "larry mccay"  wrote:

>Arun -
>
>This is a very interesting proposal and I can imagine at least a couple
>ways that it and Apache Knox could work together.
>I would like to be a contributor to Eagle as well - if you are interested.
>
>I am a committer and PMC member on Knox and Ranger and have contributed to
>a number of ecosystem projects on security issues.
>
>Good luck!
>
>--larry
>
>On Thu, Oct 22, 2015 at 4:27 AM, Luke Han  wrote:
>
>> Thanks Henry, I'm also happy to help Eagle for incubating, the
>>experience
>> we have
>> learnt during Kylin's incubating will share to Eagle team and community.
>>
>> Sign off reports and vote for release is not only things mentors will
>>help,
>> for the process, setup, policy, infrastructure, guidance, workflow, the
>> Apache Way and
>> so on...a lots of things require mentor's efforts, several active
>>mentors
>> are really important
>> for new podling committers to learn and practices in their daily work,
>>
>> We have got help from our mentors very much, hope such experience from
>> Kylin will also
>> benefits Eagle even other project.
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> Best Regards!
>> -
>>
>> Luke Han
>>
>> On Thu, Oct 22, 2015 at 1:16 PM, Greg Stein  wrote:
>>
>> > On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley 
>> > wrote:
>> >
>> > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning 
>> > > wrote:
>> > >
>> > > > I would suggest that Owen O'Malley has not had enough time to be a
>> > viable
>> > > > mentor recently and should not be on the list of mentors.
>> > > >
>> > >
>> > > I have been helping Kylin out and it is graduating, so I'm down to
>>just
>> > > Hawq. I'd like to help Eagle out.
>> > >
>> >
>> > No need to be on the official list of mentors ... just help out
>>anyways
>> ...
>> > heck, then you don't have to be responsible for signing off reports
>>;-)
>> >
>>


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread Henry Saputra
Looks like the discussion has calm down, so unless there is more
comments we will send VOTE thread tomorrow.

Thanks all for the feedback.

- Henry

On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arun  wrote:
> Hello Everyone,
>
> My name is Arun Manoharan. Currently a product manager in the Analytics 
> platform team at eBay Inc.
>
> I would like to start a discussion on Eagle and its joining the ASF as an 
> incubation project.
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
> sensitive data, recognize attacks, malicious activities and take actions in 
> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
> Eagle also provides machine learning models for detecting anomalous user 
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
> access to sensitive data, recognize attacks, malicious activities in hadoop 
> and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
> enforces policies defined on sensitive data access and alerts or blocks 
> user’s access to that sensitive data in real time. Eagle also creates user 
> profiles based on the typical access behaviour for HDFS and Hive and sends 
> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
> data information classified by external classification engines to help define 
> its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop logs 
> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create policies 
> based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service and 
> the visualization component. Eagle provides intuitive user interface to 
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data 
> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
> monitoring collects data from Kafka which is populated from namenode log4j 
> appender or from logstash agent. Eagle hive monitoring collects hive query 
> logs from running job through YARN API, which is designed to be scalable and 
> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
> data, and also supports relational database through configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an 
> abstraction of Apache Storm. It can also be extended to other streaming 
> engines. This abstraction allows developers to assemble data transformation, 
> filtering, external data join etc. without physically bound to a specific 
> streaming platform. Eagle streaming API allows developers to easily integrate 
> business logic with Eagle policy engine and internally Eagle framework 
> compiles business logic execution DAG into program primitives of underlying 
> stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
> transforms audit log from Namenode to object and joins sensitivity metadata, 
> security zone metadata which are generated from external programs or 
> configured by user. Eagle hive monitoring filters running jobs to get hive 
> query string and parses query string into object and then joins sensitivity 
> metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
> scalable policy engine framework, extensible policy engine framework. Stream 
> metadata API allows developers to declare event schema including what 
> attributes constitute an event, what is the type for each attribute, and how 
> to dynamically resolve attribute value in runtime when user configures 
> policy. Scalable policy engine framework allows policies to be executed on 
> different physical nodes in parallel. It is also used to define your own 
> policy partitioner class. Policy engine framework together with streaming 
> partitioning capability provided by all streaming platforms will make sure 
> policies and events can be evaluated in a fully distributed way. Extensible 
> policy engine framework allows developer to plugin a new policy engine with a 
> few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
> supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user activity 
> patterns or user profiles for Hadoop users based on the user behaviour in the 
> platform. These user 

Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread Henry Saputra
Looks like the discussion has calm down, so unless there is more
comments we will send VOTE thread tomorrow.

Thanks all for the feedback.

- Henry

On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arun  wrote:
> Hello Everyone,
>
> My name is Arun Manoharan. Currently a product manager in the Analytics 
> platform team at eBay Inc.
>
> I would like to start a discussion on Eagle and its joining the ASF as an 
> incubation project.
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
> sensitive data, recognize attacks, malicious activities and take actions in 
> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
> Eagle also provides machine learning models for detecting anomalous user 
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
> access to sensitive data, recognize attacks, malicious activities in hadoop 
> and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
> enforces policies defined on sensitive data access and alerts or blocks 
> user’s access to that sensitive data in real time. Eagle also creates user 
> profiles based on the typical access behaviour for HDFS and Hive and sends 
> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
> data information classified by external classification engines to help define 
> its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop logs 
> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create policies 
> based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service and 
> the visualization component. Eagle provides intuitive user interface to 
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data 
> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
> monitoring collects data from Kafka which is populated from namenode log4j 
> appender or from logstash agent. Eagle hive monitoring collects hive query 
> logs from running job through YARN API, which is designed to be scalable and 
> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
> data, and also supports relational database through configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an 
> abstraction of Apache Storm. It can also be extended to other streaming 
> engines. This abstraction allows developers to assemble data transformation, 
> filtering, external data join etc. without physically bound to a specific 
> streaming platform. Eagle streaming API allows developers to easily integrate 
> business logic with Eagle policy engine and internally Eagle framework 
> compiles business logic execution DAG into program primitives of underlying 
> stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
> transforms audit log from Namenode to object and joins sensitivity metadata, 
> security zone metadata which are generated from external programs or 
> configured by user. Eagle hive monitoring filters running jobs to get hive 
> query string and parses query string into object and then joins sensitivity 
> metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
> scalable policy engine framework, extensible policy engine framework. Stream 
> metadata API allows developers to declare event schema including what 
> attributes constitute an event, what is the type for each attribute, and how 
> to dynamically resolve attribute value in runtime when user configures 
> policy. Scalable policy engine framework allows policies to be executed on 
> different physical nodes in parallel. It is also used to define your own 
> policy partitioner class. Policy engine framework together with streaming 
> partitioning capability provided by all streaming platforms will make sure 
> policies and events can be evaluated in a fully distributed way. Extensible 
> policy engine framework allows developer to plugin a new policy engine with a 
> few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
> supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user activity 
> patterns or user profiles for Hadoop users based on the user behaviour in the 
> platform. These user 

Re: [DISCUSS] Eagle incubator proposal

2015-10-22 Thread P. Taylor Goetz
+1 for moving forward with a VOTE.

> On Oct 22, 2015, at 7:26 PM, Henry Saputra  wrote:
> 
> Looks like the discussion has calm down, so unless there is more
> comments we will send VOTE thread tomorrow.
> 
> Thanks all for the feedback.
> 
> - Henry
> 
>> On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arun  
>> wrote:
>> Hello Everyone,
>> 
>> My name is Arun Manoharan. Currently a product manager in the Analytics 
>> platform team at eBay Inc.
>> 
>> I would like to start a discussion on Eagle and its joining the ASF as an 
>> incubation project.
>> 
>> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
>> sensitive data, recognize attacks, malicious activities and take actions in 
>> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
>> Eagle also provides machine learning models for detecting anomalous user 
>> behavior in Hadoop.
>> 
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>> 
>> The text of the proposal is also available at the end of this email.
>> 
>> Thanks for your time and help.
>> 
>> Thanks,
>> Arun
>> 
>> 
>> 
>> Eagle
>> 
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
>> access to sensitive data, recognize attacks, malicious activities in hadoop 
>> and take actions.
>> 
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
>> enforces policies defined on sensitive data access and alerts or blocks 
>> user’s access to that sensitive data in real time. Eagle also creates user 
>> profiles based on the typical access behaviour for HDFS and Hive and sends 
>> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
>> data information classified by external classification engines to help 
>> define its policies.
>> 
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop logs 
>> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
>> 2.Data processing and policy engine - Eagle allows users to create policies 
>> based on various metadata properties on HDFS, Hive and HBase data.
>> 3.Eagle services - Eagle services include policy manager, query service and 
>> the visualization component. Eagle provides intuitive user interface to 
>> administer Eagle and an alert dashboard to respond to real time alerts.
>> 
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any data 
>> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
>> monitoring collects data from Kafka which is populated from namenode log4j 
>> appender or from logstash agent. Eagle hive monitoring collects hive query 
>> logs from running job through YARN API, which is designed to be scalable and 
>> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
>> data, and also supports relational database through configuration change.
>> 
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an 
>> abstraction of Apache Storm. It can also be extended to other streaming 
>> engines. This abstraction allows developers to assemble data transformation, 
>> filtering, external data join etc. without physically bound to a specific 
>> streaming platform. Eagle streaming API allows developers to easily 
>> integrate business logic with Eagle policy engine and internally Eagle 
>> framework compiles business logic execution DAG into program primitives of 
>> underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS 
>> monitoring transforms audit log from Namenode to object and joins 
>> sensitivity metadata, security zone metadata which are generated from 
>> external programs or configured by user. Eagle hive monitoring filters 
>> running jobs to get hive query string and parses query string into object 
>> and then joins sensitivity metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
>> scalable policy engine framework, extensible policy engine framework. Stream 
>> metadata API allows developers to declare event schema including what 
>> attributes constitute an event, what is the type for each attribute, and how 
>> to dynamically resolve attribute value in runtime when user configures 
>> policy. Scalable policy engine framework allows policies to be executed on 
>> different physical nodes in parallel. It is also used to define your own 
>> policy partitioner class. Policy engine framework together with streaming 
>> partitioning capability provided by all streaming platforms will make sure 
>> policies and events can be evaluated in a fully distributed way. Extensible 
>> policy engine framework allows developer to plugin a new policy engine with 
>> a few lines of codes. WSO2 Siddhi CEP engine is the policy 

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Greg Stein
On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley  wrote:

> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning 
> wrote:
>
> > I would suggest that Owen O'Malley has not had enough time to be a viable
> > mentor recently and should not be on the list of mentors.
> >
>
> I have been helping Kylin out and it is graduating, so I'm down to just
> Hawq. I'd like to help Eagle out.
>

No need to be on the official list of mentors ... just help out anyways ...
heck, then you don't have to be responsible for signing off reports ;-)


Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Henry Saputra
HI Marvin,

You are preaching to the choir with me =)
Totally agree with your comment.

The Eagle team have met with proposed mentors before and have
presented and ask for availability and willingness to help mentor the
project and we got positive response from them.
Now, if the question is whether these mentors will or will not active
in reality it would hard for me to answer.

I know Julian and Taylor have been active and helpful in the Kylin
incubating life.

To be fair, I will circle one more time to existing mentors for Eagle
to confirm their commitment for active participation in the podling.
Would that be acceptable solution?

- Henry

On Wed, Oct 21, 2015 at 9:24 AM, Marvin Humphrey  wrote:
> On Tue, Oct 20, 2015 at 9:16 PM, Henry Saputra  
> wrote:
>> Hi Ted,
>>
>> Thanks for your concern, but we have had discussions with all proposed
>> mentors before to ask for their availability and willingness to
>> actively mentor this project.
>>
>> I think we are good with existing proposed mentors.
>
> Henry,
>
> 4 of the 5 proposed Mentors for Eagle are also Mentors for Kylin.  Kylin's 1.1
> release candidate is still twisting in the wind awaiting IPMC votes.  It was
> was offered up on dev@kylin on October 10th -- 11 days ago.  It got one Mentor
> vote immediately, and another after 9 days.  It is still waiting for a third
> IPMC vote.
>
> I think the IPMC needs to take into consideration whether Eagle will have
> enough active Mentors when voting on this proposal, since at least some of
> the proposed Mentors seem to be having difficulty with their current load.
>
> Mentors who do not actually participate should not be Mentors.
>
> Marvin Humphrey
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Siddharth Wagle
Hi Arun,

This proposal looks great. I would like to be an active contributor on this 
project. I bring with me the experience of Apache Ambari and developing the 
Ambari Metrics System.

Best Regards,
Sid

From: Julian Hyde <jh...@apache.org>
Sent: Wednesday, October 21, 2015 9:10 AM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Eagle incubator proposal

My name is already on the list of mentors. I think this project fills an 
important need. Several of the initial committers were involved with Kylin and 
therefore know the Apache process.

Julian


> On Oct 20, 2015, at 11:58 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:
>
> I should also have some improved bandwidth both now that Kylin is nearing 
> graduation and for other reasons. I’ve been bogged down recently, but that’s 
> starting to change.
>
> If more mentors are desired, I’d be willing to help in that respect.
>
> -Taylor
>
>> On Oct 20, 2015, at 11:49 AM, Henry Saputra <henry.sapu...@gmail.com> wrote:
>>
>> Hi Ted,
>>
>> Since Kylin almost ready to graduate, I have more bandwidth to help with 
>> Eagle.
>>
>> But, you are right that current proposed mentors for Eagle seemed to
>> be very busy with other podlings, so 1 or 2 additional mentors would
>> be great.
>>
>> The good news is that the team consist some people from Kylin, for
>> example Luke, which done great job helping Kylin to understand working
>> with Apache way.
>> So we have some help from initial committers who have done the rodeo before.
>>
>> - Henry
>>
>> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>>> I would suggest that Owen O'Malley has not had enough time to be a viable
>>> mentor recently and should not be on the list of mentors.
>>>
>>> Henry and Julian are good if their schedules permit.  Henry, I know has
>>> been mentoring a number of projects lately.
>>>
>>>
>>>
>>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Arun,
>>>>
>>>> very interesting proposal. I may see some possible interaction with
>>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
>>>> kind of Change Data Capture), etc.
>>>>
>>>> So, I see a different perspective in Eagle, but Eagle could also leverage
>>>> Falcon somehow.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>>>>
>>>>> Hello Everyone,
>>>>>
>>>>> My name is Arun Manoharan. Currently a product manager in the Analytics
>>>>> platform team at eBay Inc.
>>>>>
>>>>> I would like to start a discussion on Eagle and its joining the ASF as an
>>>>> incubation project.
>>>>>
>>>>> Eagle is a Monitoring solution for Hadoop to instantly identify access to
>>>>> sensitive data, recognize attacks, malicious activities and take actions 
>>>>> in
>>>>> real time. Eagle supports a wide variety of policies on HDFS data and 
>>>>> Hive.
>>>>> Eagle also provides machine learning models for detecting anomalous user
>>>>> behavior in Hadoop.
>>>>>
>>>>> The proposal is available on the wiki here:
>>>>> https://wiki.apache.org/incubator/EagleProposal
>>>>>
>>>>> The text of the proposal is also available at the end of this email.
>>>>>
>>>>> Thanks for your time and help.
>>>>>
>>>>> Thanks,
>>>>> Arun
>>>>>
>>>>> 
>>>>>
>>>>> Eagle
>>>>>
>>>>> Abstract
>>>>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>>>> identify access to sensitive data, recognize attacks, malicious activities
>>>>> in hadoop and take actions.
>>>>>
>>>>> Proposal
>>>>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>>>> enforces policies defined on sensitive data access and alerts or blocks
>>>>> user’s access to that sensitive data in real time. Eagle also creates user
>>>>> profiles based on the typical access behaviour for HDFS and Hive and sends
>>>>> alerts when anomalous behaviour

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Manoharan, Arun
Hi Sid,

Thanks for your support.

Actually we have developed an Ambari plugin for Eagle where someone could
use Ambari to deploy Eagle. We have this working on the sandbox. Would
like to have you as a contributor. I will reach out to you.

Thanks,
Arun

On 10/21/15, 9:22 AM, "Siddharth Wagle" <swa...@hortonworks.com> wrote:

>Hi Arun,
>
>This proposal looks great. I would like to be an active contributor on
>this project. I bring with me the experience of Apache Ambari and
>developing the Ambari Metrics System.
>
>Best Regards,
>Sid
>
>From: Julian Hyde <jh...@apache.org>
>Sent: Wednesday, October 21, 2015 9:10 AM
>To: general@incubator.apache.org
>Subject: Re: [DISCUSS] Eagle incubator proposal
>
>My name is already on the list of mentors. I think this project fills an
>important need. Several of the initial committers were involved with
>Kylin and therefore know the Apache process.
>
>Julian
>
>
>> On Oct 20, 2015, at 11:58 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:
>>
>> I should also have some improved bandwidth both now that Kylin is
>>nearing graduation and for other reasons. I¹ve been bogged down
>>recently, but that¹s starting to change.
>>
>> If more mentors are desired, I¹d be willing to help in that respect.
>>
>> -Taylor
>>
>>> On Oct 20, 2015, at 11:49 AM, Henry Saputra <henry.sapu...@gmail.com>
>>>wrote:
>>>
>>> Hi Ted,
>>>
>>> Since Kylin almost ready to graduate, I have more bandwidth to help
>>>with Eagle.
>>>
>>> But, you are right that current proposed mentors for Eagle seemed to
>>> be very busy with other podlings, so 1 or 2 additional mentors would
>>> be great.
>>>
>>> The good news is that the team consist some people from Kylin, for
>>> example Luke, which done great job helping Kylin to understand working
>>> with Apache way.
>>> So we have some help from initial committers who have done the rodeo
>>>before.
>>>
>>> - Henry
>>>
>>> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning <ted.dunn...@gmail.com>
>>>wrote:
>>>> I would suggest that Owen O'Malley has not had enough time to be a
>>>>viable
>>>> mentor recently and should not be on the list of mentors.
>>>>
>>>> Henry and Julian are good if their schedules permit.  Henry, I know
>>>>has
>>>> been mentoring a number of projects lately.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré
>>>><j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Arun,
>>>>>
>>>>> very interesting proposal. I may see some possible interaction with
>>>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring
>>>>>(with a
>>>>> kind of Change Data Capture), etc.
>>>>>
>>>>> So, I see a different perspective in Eagle, but Eagle could also
>>>>>leverage
>>>>> Falcon somehow.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>>>>>
>>>>>> Hello Everyone,
>>>>>>
>>>>>> My name is Arun Manoharan. Currently a product manager in the
>>>>>>Analytics
>>>>>> platform team at eBay Inc.
>>>>>>
>>>>>> I would like to start a discussion on Eagle and its joining the ASF
>>>>>>as an
>>>>>> incubation project.
>>>>>>
>>>>>> Eagle is a Monitoring solution for Hadoop to instantly identify
>>>>>>access to
>>>>>> sensitive data, recognize attacks, malicious activities and take
>>>>>>actions in
>>>>>> real time. Eagle supports a wide variety of policies on HDFS data
>>>>>>and Hive.
>>>>>> Eagle also provides machine learning models for detecting anomalous
>>>>>>user
>>>>>> behavior in Hadoop.
>>>>>>
>>>>>> The proposal is available on the wiki here:
>>>>>> https://wiki.apache.org/incubator/EagleProposal
>>>>>>
>>>>>> The text of the proposal is also available at the end of this email.
>>>>>>
>>>>>> Thanks for your time and 

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Julian Hyde
My name is already on the list of mentors. I think this project fills an 
important need. Several of the initial committers were involved with Kylin and 
therefore know the Apache process.

Julian
 

> On Oct 20, 2015, at 11:58 AM, P. Taylor Goetz  wrote:
> 
> I should also have some improved bandwidth both now that Kylin is nearing 
> graduation and for other reasons. I’ve been bogged down recently, but that’s 
> starting to change.
> 
> If more mentors are desired, I’d be willing to help in that respect.
> 
> -Taylor
> 
>> On Oct 20, 2015, at 11:49 AM, Henry Saputra  wrote:
>> 
>> Hi Ted,
>> 
>> Since Kylin almost ready to graduate, I have more bandwidth to help with 
>> Eagle.
>> 
>> But, you are right that current proposed mentors for Eagle seemed to
>> be very busy with other podlings, so 1 or 2 additional mentors would
>> be great.
>> 
>> The good news is that the team consist some people from Kylin, for
>> example Luke, which done great job helping Kylin to understand working
>> with Apache way.
>> So we have some help from initial committers who have done the rodeo before.
>> 
>> - Henry
>> 
>> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning  wrote:
>>> I would suggest that Owen O'Malley has not had enough time to be a viable
>>> mentor recently and should not be on the list of mentors.
>>> 
>>> Henry and Julian are good if their schedules permit.  Henry, I know has
>>> been mentoring a number of projects lately.
>>> 
>>> 
>>> 
>>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
 Hi Arun,
 
 very interesting proposal. I may see some possible interaction with
 Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
 kind of Change Data Capture), etc.
 
 So, I see a different perspective in Eagle, but Eagle could also leverage
 Falcon somehow.
 
 Regards
 JB
 
 
 On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
 
> Hello Everyone,
> 
> My name is Arun Manoharan. Currently a product manager in the Analytics
> platform team at eBay Inc.
> 
> I would like to start a discussion on Eagle and its joining the ASF as an
> incubation project.
> 
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions 
> in
> real time. Eagle supports a wide variety of policies on HDFS data and 
> Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
> 
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
> 
> The text of the proposal is also available at the end of this email.
> 
> Thanks for your time and help.
> 
> Thanks,
> Arun
> 
> 
> 
> Eagle
> 
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
> 
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines 
> to
> help define its policies.
> 
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for 
> storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase 
> data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface 
> to
> administer Eagle and an alert dashboard to respond to real time alerts.
> 
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
> 
> Data Processing and Policy 

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Marvin Humphrey
On Tue, Oct 20, 2015 at 9:16 PM, Henry Saputra  wrote:
> Hi Ted,
>
> Thanks for your concern, but we have had discussions with all proposed
> mentors before to ask for their availability and willingness to
> actively mentor this project.
>
> I think we are good with existing proposed mentors.

Henry,

4 of the 5 proposed Mentors for Eagle are also Mentors for Kylin.  Kylin's 1.1
release candidate is still twisting in the wind awaiting IPMC votes.  It was
was offered up on dev@kylin on October 10th -- 11 days ago.  It got one Mentor
vote immediately, and another after 9 days.  It is still waiting for a third
IPMC vote.

I think the IPMC needs to take into consideration whether Eagle will have
enough active Mentors when voting on this proposal, since at least some of
the proposed Mentors seem to be having difficulty with their current load.

Mentors who do not actually participate should not be Mentors.

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Manoharan, Arun
Thanks Don. Looking forward to work with the Ranger team for meaningful
integrations.

On 10/20/15, 10:19 PM, "Don Bosco Durai"  wrote:

>Hi Arun
>
>This looks really good and fills some obvious gaps in the security
>landscape.
>
>Happy to contribute anyway you want.
>
>All the best!!!
>
>Bosco
>
>
>
>
>
>On 10/20/15, 8:02 AM, "Alex Karasulu" akaras...@apache.org> wrote:
>
>>Hi Arun,
>>
>>Eagle sounds very promising. I just had a discussion with someone about
>>this exact need. I do however agree with Greg on the name. As far as I
>>can
>>see, besides the name, your weakest point is the all eBay employed team.
>>It's not a blocker and can be fixed during incubation. Good luck to you.
>>
>>Alex
>>
>>
>>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun 
>>wrote:
>>
>>> Hi Greg,
>>>
>>> Thank you for reviewing the proposal.
>>>
>>> Originally we thought Eagle might be trademarked by someone already
>>>but I
>>> went thru eBay legal team to get the clearance for the name to be
>>>used. We
>>> will look into it again to see if there will be potential problems.
>>>
>>> Thanks,
>>> Arun
>>>
>>> On 10/20/15, 1:52 AM, "Greg Stein"  wrote:
>>>
>>> >Hey there, Arun! ... I have no commentary on the proposal itself, as
>>>it
>>> >looks like a great proposal. I would suggest being a bit wary of the
>>>name,
>>> >as "Eagle" is a *very* popular PCB design program.
>>> >
>>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun
>>>
>>> >wrote:
>>> >
>>> >> Hello Everyone,
>>> >>
>>> >> My name is Arun Manoharan. Currently a product manager in the
>>>Analytics
>>> >> platform team at eBay Inc.
>>> >>
>>> >> I would like to start a discussion on Eagle and its joining the ASF
>>>as
>>> >>an
>>> >> incubation project.
>>> >>
>>> >> Eagle is a Monitoring solution for Hadoop to instantly identify
>>>access
>>> >>to
>>> >> sensitive data, recognize attacks, malicious activities and take
>>> >>actions in
>>> >> real time. Eagle supports a wide variety of policies on HDFS data
>>>and
>>> >>Hive.
>>> >> Eagle also provides machine learning models for detecting anomalous
>>>user
>>> >> behavior in Hadoop.
>>> >>
>>> >> The proposal is available on the wiki here:
>>> >> https://wiki.apache.org/incubator/EagleProposal
>>> >>
>>> >> The text of the proposal is also available at the end of this email.
>>> >>
>>> >> Thanks for your time and help.
>>> >>
>>> >> Thanks,
>>> >> Arun
>>> >>
>>> >> 
>>> >>
>>> >> Eagle
>>> >>
>>> >> Abstract
>>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>> >> identify access to sensitive data, recognize attacks, malicious
>>> >>activities
>>> >> in hadoop and take actions.
>>> >>
>>> >> Proposal
>>> >> Eagle audits access to HDFS files, Hive and HBase tables in real
>>>time,
>>> >> enforces policies defined on sensitive data access and alerts or
>>>blocks
>>> >> user¹s access to that sensitive data in real time. Eagle also
>>>creates
>>> >>user
>>> >> profiles based on the typical access behaviour for HDFS and Hive and
>>> >>sends
>>> >> alerts when anomalous behaviour is detected. Eagle can also import
>>> >> sensitive data information classified by external classification
>>> >>engines to
>>> >> help define its policies.
>>> >>
>>> >> Overview of Eagle
>>> >> Eagle has 3 main parts.
>>> >> 1.Data collection and storage - Eagle collects data from various
>>>hadoop
>>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>> >>storage.
>>> >> 2.Data processing and policy engine - Eagle allows users to create
>>> >> policies based on various metadata properties on HDFS, Hive and
>>>HBase
>>> >>data.
>>> >> 3.Eagle services - Eagle services include policy manager, query
>>>service
>>> >> and the visualization component. Eagle provides intuitive user
>>> >>interface to
>>> >> administer Eagle and an alert dashboard to respond to real time
>>>alerts.
>>> >>
>>> >> Data Collection and Storage:
>>> >> Eagle provides programming API for extending Eagle to integrate any
>>>data
>>> >> source into Eagle policy evaluation framework. For example, Eagle
>>>hdfs
>>> >> audit monitoring collects data from Kafka which is populated from
>>> >>namenode
>>> >> log4j appender or from logstash agent. Eagle hive monitoring
>>>collects
>>> >>hive
>>> >> query logs from running job through YARN API, which is designed to
>>>be
>>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>>> >> metadata and metrics data, and also supports relational database
>>>through
>>> >> configuration change.
>>> >>
>>> >> Data Processing and Policy Engine:
>>> >> Processing Engine: Eagle provides stream processing API which is an
>>> >> abstraction of Apache Storm. It can also be extended to other
>>>streaming
>>> >> engines. This abstraction allows developers to assemble data
>>> >> transformation, filtering, external data join etc. without
>>>physically
>>> >>bound
>>> >> 

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Greg Stein
On Wed, Oct 21, 2015 at 11:53 AM, Henry Saputra 
wrote:
>...

> To be fair, I will circle one more time to existing mentors for Eagle
> to confirm their commitment for active participation in the podling.
> Would that be acceptable solution?
>

Acceptable to whom? I bet you there are enough people who find the list of
mentors to be acceptable, despite Marvin's emails otherwise... :-)

To put it another way: it is unfair to pre-judge people. Even more,
*accepting* a podling should not be subject to a preapproved list of
mentors, since we have juggled mentors many times in the past. *Should* a
mentor be absent in the *future*, then they can be replaced. ... but I
believe it is better to be reactive, than to pre-judge.

Cheers,
-g


Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Owen O'Malley
On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning  wrote:

> I would suggest that Owen O'Malley has not had enough time to be a viable
> mentor recently and should not be on the list of mentors.
>

I have been helping Kylin out and it is graduating, so I'm down to just
Hawq. I'd like to help Eagle out.

.. Owen


> Henry and Julian are good if their schedules permit.  Henry, I know has
> been mentoring a number of projects lately.
>
>
>
> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Arun,
> >
> > very interesting proposal. I may see some possible interaction with
> > Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
> > kind of Change Data Capture), etc.
> >
> > So, I see a different perspective in Eagle, but Eagle could also leverage
> > Falcon somehow.
> >
> > Regards
> > JB
> >
> >
> > On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
> >
> >> Hello Everyone,
> >>
> >> My name is Arun Manoharan. Currently a product manager in the Analytics
> >> platform team at eBay Inc.
> >>
> >> I would like to start a discussion on Eagle and its joining the ASF as
> an
> >> incubation project.
> >>
> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> to
> >> sensitive data, recognize attacks, malicious activities and take
> actions in
> >> real time. Eagle supports a wide variety of policies on HDFS data and
> Hive.
> >> Eagle also provides machine learning models for detecting anomalous user
> >> behavior in Hadoop.
> >>
> >> The proposal is available on the wiki here:
> >> https://wiki.apache.org/incubator/EagleProposal
> >>
> >> The text of the proposal is also available at the end of this email.
> >>
> >> Thanks for your time and help.
> >>
> >> Thanks,
> >> Arun
> >>
> >> 
> >>
> >> Eagle
> >>
> >> Abstract
> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >> identify access to sensitive data, recognize attacks, malicious
> activities
> >> in hadoop and take actions.
> >>
> >> Proposal
> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> >> enforces policies defined on sensitive data access and alerts or blocks
> >> user’s access to that sensitive data in real time. Eagle also creates
> user
> >> profiles based on the typical access behaviour for HDFS and Hive and
> sends
> >> alerts when anomalous behaviour is detected. Eagle can also import
> >> sensitive data information classified by external classification
> engines to
> >> help define its policies.
> >>
> >> Overview of Eagle
> >> Eagle has 3 main parts.
> >> 1.Data collection and storage - Eagle collects data from various hadoop
> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> storage.
> >> 2.Data processing and policy engine - Eagle allows users to create
> >> policies based on various metadata properties on HDFS, Hive and HBase
> data.
> >> 3.Eagle services - Eagle services include policy manager, query service
> >> and the visualization component. Eagle provides intuitive user
> interface to
> >> administer Eagle and an alert dashboard to respond to real time alerts.
> >>
> >> Data Collection and Storage:
> >> Eagle provides programming API for extending Eagle to integrate any data
> >> source into Eagle policy evaluation framework. For example, Eagle hdfs
> >> audit monitoring collects data from Kafka which is populated from
> namenode
> >> log4j appender or from logstash agent. Eagle hive monitoring collects
> hive
> >> query logs from running job through YARN API, which is designed to be
> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> >> metadata and metrics data, and also supports relational database through
> >> configuration change.
> >>
> >> Data Processing and Policy Engine:
> >> Processing Engine: Eagle provides stream processing API which is an
> >> abstraction of Apache Storm. It can also be extended to other streaming
> >> engines. This abstraction allows developers to assemble data
> >> transformation, filtering, external data join etc. without physically
> bound
> >> to a specific streaming platform. Eagle streaming API allows developers
> to
> >> easily integrate business logic with Eagle policy engine and internally
> >> Eagle framework compiles business logic execution DAG into program
> >> primitives of underlying stream infrastructure e.g. Apache Storm. For
> >> example, Eagle HDFS monitoring transforms audit log from Namenode to
> object
> >> and joins sensitivity metadata, security zone metadata which are
> generated
> >> from external programs or configured by user. Eagle hive monitoring
> filters
> >> running jobs to get hive query string and parses query string into
> object
> >> and then joins sensitivity metadata.
> >> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> >> scalable policy engine framework, extensible policy engine framework.
> >> Stream metadata API allows developers to declare 

Re: [DISCUSS] Eagle incubator proposal

2015-10-21 Thread Henry Saputra
I think the concern was just kind of a spill-over from other
discussions about rebooting the Incubator so new proposal kind of
being scrutinized more.
Which is OK for me, that is why we have DISCUSS thread right now :)

To just follow up, the concerns that have been raised, the Eagle team
is happy with existing mentors and would like to move forward with
them.
I have strong faith in my fellow IPMCs as mentors for this project to
help the project go through the incubation process.

- Henry

On Wed, Oct 21, 2015 at 12:02 PM, Greg Stein  wrote:
> On Wed, Oct 21, 2015 at 11:53 AM, Henry Saputra 
> wrote:
>>...
>
>> To be fair, I will circle one more time to existing mentors for Eagle
>> to confirm their commitment for active participation in the podling.
>> Would that be acceptable solution?
>>
>
> Acceptable to whom? I bet you there are enough people who find the list of
> mentors to be acceptable, despite Marvin's emails otherwise... :-)
>
> To put it another way: it is unfair to pre-judge people. Even more,
> *accepting* a podling should not be subject to a preapproved list of
> mentors, since we have juggled mentors many times in the past. *Should* a
> mentor be absent in the *future*, then they can be replaced. ... but I
> believe it is better to be reactive, than to pre-judge.
>
> Cheers,
> -g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Zhang, Edward (GDI Hadoop)
Eagle in realtime evaluates security policies against event stream in a
fully distributed way, so low latency and event partition are the two
important factors for identifying malicious access instantly. So
onboarding data through Falcon should consider these.

Thanks
Edward Zhang

On 10/19/15, 22:46, "Jean-Baptiste Onofré"  wrote:

>It makes sense. I will try to contribute on this ;)
>
>Regards
>JB
>
>On 10/19/2015 09:46 PM, Zhang, Edward (GDI Hadoop) wrote:
>> Hi JB,
>>
>> That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase
>>data
>> changes, so this feature would complement Eagle which today mainly
>>focuses
>> on HDFS/Hive/HBase data access including view, change, delete etc. Eagle
>> would benefit if Eagle can instantly capture data change from Falcon.
>>
>> Thanks
>> Edward Zhang
>>
>>
>>
>> On 10/19/15, 8:40, "Jean-Baptiste Onofré"  wrote:
>>
>>> Hi Arun,
>>>
>>> very interesting proposal. I may see some possible interaction with
>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with
>>> a kind of Change Data Capture), etc.
>>>
>>> So, I see a different perspective in Eagle, but Eagle could also
>>> leverage Falcon somehow.
>>>
>>> Regards
>>> JB
>>>
>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
 Hello Everyone,

 My name is Arun Manoharan. Currently a product manager in the
Analytics
 platform team at eBay Inc.

 I would like to start a discussion on Eagle and its joining the ASF as
 an incubation project.

 Eagle is a Monitoring solution for Hadoop to instantly identify access
 to sensitive data, recognize attacks, malicious activities and take
 actions in real time. Eagle supports a wide variety of policies on
HDFS
 data and Hive. Eagle also provides machine learning models for
detecting
 anomalous user behavior in Hadoop.

 The proposal is available on the wiki here:
 https://wiki.apache.org/incubator/EagleProposal

 The text of the proposal is also available at the end of this email.

 Thanks for your time and help.

 Thanks,
 Arun

 

 Eagle

 Abstract
 Eagle is an Open Source Monitoring solution for Hadoop to instantly
 identify access to sensitive data, recognize attacks, malicious
 activities in hadoop and take actions.

 Proposal
 Eagle audits access to HDFS files, Hive and HBase tables in real time,
 enforces policies defined on sensitive data access and alerts or
blocks
 user¹s access to that sensitive data in real time. Eagle also creates
 user profiles based on the typical access behaviour for HDFS and Hive
 and sends alerts when anomalous behaviour is detected. Eagle can also
 import sensitive data information classified by external
classification
 engines to help define its policies.

 Overview of Eagle
 Eagle has 3 main parts.
 1.Data collection and storage - Eagle collects data from various
hadoop
 logs in real time using Kafka/Yarn API and uses HDFS and HBase for
 storage.
 2.Data processing and policy engine - Eagle allows users to create
 policies based on various metadata properties on HDFS, Hive and HBase
 data.
 3.Eagle services - Eagle services include policy manager, query
service
 and the visualization component. Eagle provides intuitive user
interface
 to administer Eagle and an alert dashboard to respond to real time
 alerts.

 Data Collection and Storage:
 Eagle provides programming API for extending Eagle to integrate any
 data source into Eagle policy evaluation framework. For example, Eagle
 hdfs audit monitoring collects data from Kafka which is populated from
 namenode log4j appender or from logstash agent. Eagle hive monitoring
 collects hive query logs from running job through YARN API, which is
 designed to be scalable and fault-tolerant. Eagle uses HBase as
storage
 for storing metadata and metrics data, and also supports relational
 database through configuration change.

 Data Processing and Policy Engine:
 Processing Engine: Eagle provides stream processing API which is an
 abstraction of Apache Storm. It can also be extended to other
streaming
 engines. This abstraction allows developers to assemble data
 transformation, filtering, external data join etc. without physically
 bound to a specific streaming platform. Eagle streaming API allows
 developers to easily integrate business logic with Eagle policy engine
 and internally Eagle framework compiles business logic execution DAG
 into program primitives of underlying stream infrastructure e.g.
Apache
 Storm. For example, Eagle HDFS monitoring transforms audit log from
 Namenode to object and joins sensitivity metadata, security zone
 metadata which are generated from external 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Greg Stein
Hey there, Arun! ... I have no commentary on the proposal itself, as it
looks like a great proposal. I would suggest being a bit wary of the name,
as "Eagle" is a *very* popular PCB design program.

On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> My name is Arun Manoharan. Currently a product manager in the Analytics
> platform team at eBay Inc.
>
> I would like to start a discussion on Eagle and its joining the ASF as an
> incubation project.
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Amareshwari Sriramdasu
I would like to volunteer as mentor and help the project, if you are
looking for more mentors.

Thanks
Amareshwari

On Mon, Oct 19, 2015 at 9:03 PM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> My name is Arun Manoharan. Currently a product manager in the Analytics
> platform team at eBay Inc.
>
> I would like to start a discussion on Eagle and its joining the ASF as an
> incubation project.
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Luke Han
So glad to see one more project coming from eBay:-)




Best Regards!
-

Luke Han

On Tue, Oct 20, 2015 at 4:52 PM, Greg Stein  wrote:

> Hey there, Arun! ... I have no commentary on the proposal itself, as it
> looks like a great proposal. I would suggest being a bit wary of the name,
> as "Eagle" is a *very* popular PCB design program.
>
> On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
> wrote:
>
> > Hello Everyone,
> >
> > My name is Arun Manoharan. Currently a product manager in the Analytics
> > platform team at eBay Inc.
> >
> > I would like to start a discussion on Eagle and its joining the ASF as an
> > incubation project.
> >
> > Eagle is a Monitoring solution for Hadoop to instantly identify access to
> > sensitive data, recognize attacks, malicious activities and take actions
> in
> > real time. Eagle supports a wide variety of policies on HDFS data and
> Hive.
> > Eagle also provides machine learning models for detecting anomalous user
> > behavior in Hadoop.
> >
> > The proposal is available on the wiki here:
> > https://wiki.apache.org/incubator/EagleProposal
> >
> > The text of the proposal is also available at the end of this email.
> >
> > Thanks for your time and help.
> >
> > Thanks,
> > Arun
> >
> > 
> >
> > Eagle
> >
> > Abstract
> > Eagle is an Open Source Monitoring solution for Hadoop to instantly
> > identify access to sensitive data, recognize attacks, malicious
> activities
> > in hadoop and take actions.
> >
> > Proposal
> > Eagle audits access to HDFS files, Hive and HBase tables in real time,
> > enforces policies defined on sensitive data access and alerts or blocks
> > user’s access to that sensitive data in real time. Eagle also creates
> user
> > profiles based on the typical access behaviour for HDFS and Hive and
> sends
> > alerts when anomalous behaviour is detected. Eagle can also import
> > sensitive data information classified by external classification engines
> to
> > help define its policies.
> >
> > Overview of Eagle
> > Eagle has 3 main parts.
> > 1.Data collection and storage - Eagle collects data from various hadoop
> > logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> storage.
> > 2.Data processing and policy engine - Eagle allows users to create
> > policies based on various metadata properties on HDFS, Hive and HBase
> data.
> > 3.Eagle services - Eagle services include policy manager, query service
> > and the visualization component. Eagle provides intuitive user interface
> to
> > administer Eagle and an alert dashboard to respond to real time alerts.
> >
> > Data Collection and Storage:
> > Eagle provides programming API for extending Eagle to integrate any data
> > source into Eagle policy evaluation framework. For example, Eagle hdfs
> > audit monitoring collects data from Kafka which is populated from
> namenode
> > log4j appender or from logstash agent. Eagle hive monitoring collects
> hive
> > query logs from running job through YARN API, which is designed to be
> > scalable and fault-tolerant. Eagle uses HBase as storage for storing
> > metadata and metrics data, and also supports relational database through
> > configuration change.
> >
> > Data Processing and Policy Engine:
> > Processing Engine: Eagle provides stream processing API which is an
> > abstraction of Apache Storm. It can also be extended to other streaming
> > engines. This abstraction allows developers to assemble data
> > transformation, filtering, external data join etc. without physically
> bound
> > to a specific streaming platform. Eagle streaming API allows developers
> to
> > easily integrate business logic with Eagle policy engine and internally
> > Eagle framework compiles business logic execution DAG into program
> > primitives of underlying stream infrastructure e.g. Apache Storm. For
> > example, Eagle HDFS monitoring transforms audit log from Namenode to
> object
> > and joins sensitivity metadata, security zone metadata which are
> generated
> > from external programs or configured by user. Eagle hive monitoring
> filters
> > running jobs to get hive query string and parses query string into object
> > and then joins sensitivity metadata.
> > Alerting Framework: Eagle Alert Framework includes stream metadata API,
> > scalable policy engine framework, extensible policy engine framework.
> > Stream metadata API allows developers to declare event schema including
> > what attributes constitute an event, what is the type for each attribute,
> > and how to dynamically resolve attribute value in runtime when user
> > configures policy. Scalable policy engine framework allows policies to be
> > executed on different physical nodes in parallel. It is also used to
> define
> > your own policy partitioner class. Policy engine framework together with
> > streaming partitioning capability provided by all streaming platforms
> will
> > make sure policies and events can be evaluated in a fully 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Henry Saputra
Hi Ted,

Since Kylin almost ready to graduate, I have more bandwidth to help with Eagle.

But, you are right that current proposed mentors for Eagle seemed to
be very busy with other podlings, so 1 or 2 additional mentors would
be great.

The good news is that the team consist some people from Kylin, for
example Luke, which done great job helping Kylin to understand working
with Apache way.
So we have some help from initial committers who have done the rodeo before.

- Henry

On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning  wrote:
> I would suggest that Owen O'Malley has not had enough time to be a viable
> mentor recently and should not be on the list of mentors.
>
> Henry and Julian are good if their schedules permit.  Henry, I know has
> been mentoring a number of projects lately.
>
>
>
> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Arun,
>>
>> very interesting proposal. I may see some possible interaction with
>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
>> kind of Change Data Capture), etc.
>>
>> So, I see a different perspective in Eagle, but Eagle could also leverage
>> Falcon somehow.
>>
>> Regards
>> JB
>>
>>
>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>>
>>> Hello Everyone,
>>>
>>> My name is Arun Manoharan. Currently a product manager in the Analytics
>>> platform team at eBay Inc.
>>>
>>> I would like to start a discussion on Eagle and its joining the ASF as an
>>> incubation project.
>>>
>>> Eagle is a Monitoring solution for Hadoop to instantly identify access to
>>> sensitive data, recognize attacks, malicious activities and take actions in
>>> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
>>> Eagle also provides machine learning models for detecting anomalous user
>>> behavior in Hadoop.
>>>
>>> The proposal is available on the wiki here:
>>> https://wiki.apache.org/incubator/EagleProposal
>>>
>>> The text of the proposal is also available at the end of this email.
>>>
>>> Thanks for your time and help.
>>>
>>> Thanks,
>>> Arun
>>>
>>> 
>>>
>>> Eagle
>>>
>>> Abstract
>>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>> identify access to sensitive data, recognize attacks, malicious activities
>>> in hadoop and take actions.
>>>
>>> Proposal
>>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>> enforces policies defined on sensitive data access and alerts or blocks
>>> user’s access to that sensitive data in real time. Eagle also creates user
>>> profiles based on the typical access behaviour for HDFS and Hive and sends
>>> alerts when anomalous behaviour is detected. Eagle can also import
>>> sensitive data information classified by external classification engines to
>>> help define its policies.
>>>
>>> Overview of Eagle
>>> Eagle has 3 main parts.
>>> 1.Data collection and storage - Eagle collects data from various hadoop
>>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
>>> 2.Data processing and policy engine - Eagle allows users to create
>>> policies based on various metadata properties on HDFS, Hive and HBase data.
>>> 3.Eagle services - Eagle services include policy manager, query service
>>> and the visualization component. Eagle provides intuitive user interface to
>>> administer Eagle and an alert dashboard to respond to real time alerts.
>>>
>>> Data Collection and Storage:
>>> Eagle provides programming API for extending Eagle to integrate any data
>>> source into Eagle policy evaluation framework. For example, Eagle hdfs
>>> audit monitoring collects data from Kafka which is populated from namenode
>>> log4j appender or from logstash agent. Eagle hive monitoring collects hive
>>> query logs from running job through YARN API, which is designed to be
>>> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>>> metadata and metrics data, and also supports relational database through
>>> configuration change.
>>>
>>> Data Processing and Policy Engine:
>>> Processing Engine: Eagle provides stream processing API which is an
>>> abstraction of Apache Storm. It can also be extended to other streaming
>>> engines. This abstraction allows developers to assemble data
>>> transformation, filtering, external data join etc. without physically bound
>>> to a specific streaming platform. Eagle streaming API allows developers to
>>> easily integrate business logic with Eagle policy engine and internally
>>> Eagle framework compiles business logic execution DAG into program
>>> primitives of underlying stream infrastructure e.g. Apache Storm. For
>>> example, Eagle HDFS monitoring transforms audit log from Namenode to object
>>> and joins sensitivity metadata, security zone metadata which are generated
>>> from external programs or configured by user. Eagle hive monitoring filters
>>> running jobs to get hive query string and parses query string into object
>>> and then 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Sam Ruby
On Tue, Oct 20, 2015 at 10:51 AM, Manoharan, Arun  wrote:
> Hi Greg,
>
> Thank you for reviewing the proposal.
>
> Originally we thought Eagle might be trademarked by someone already but I
> went thru eBay legal team to get the clearance for the name to be used. We
> will look into it again to see if there will be potential problems.

Ultimately it will be the ASF that determines the appropriateness of
the name for a podling.  A few pointers:

http://incubator.apache.org/guides/names.html
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH/

> Thanks,
> Arun

- Sam Ruby

> On 10/20/15, 1:52 AM, "Greg Stein"  wrote:
>
>>Hey there, Arun! ... I have no commentary on the proposal itself, as it
>>looks like a great proposal. I would suggest being a bit wary of the name,
>>as "Eagle" is a *very* popular PCB design program.
>>
>>On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
>>wrote:
>>
>>> Hello Everyone,
>>>
>>> My name is Arun Manoharan. Currently a product manager in the Analytics
>>> platform team at eBay Inc.
>>>
>>> I would like to start a discussion on Eagle and its joining the ASF as
>>>an
>>> incubation project.
>>>
>>> Eagle is a Monitoring solution for Hadoop to instantly identify access
>>>to
>>> sensitive data, recognize attacks, malicious activities and take
>>>actions in
>>> real time. Eagle supports a wide variety of policies on HDFS data and
>>>Hive.
>>> Eagle also provides machine learning models for detecting anomalous user
>>> behavior in Hadoop.
>>>
>>> The proposal is available on the wiki here:
>>> https://wiki.apache.org/incubator/EagleProposal
>>>
>>> The text of the proposal is also available at the end of this email.
>>>
>>> Thanks for your time and help.
>>>
>>> Thanks,
>>> Arun
>>>
>>> 
>>>
>>> Eagle
>>>
>>> Abstract
>>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>> identify access to sensitive data, recognize attacks, malicious
>>>activities
>>> in hadoop and take actions.
>>>
>>> Proposal
>>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>> enforces policies defined on sensitive data access and alerts or blocks
>>> user¹s access to that sensitive data in real time. Eagle also creates
>>>user
>>> profiles based on the typical access behaviour for HDFS and Hive and
>>>sends
>>> alerts when anomalous behaviour is detected. Eagle can also import
>>> sensitive data information classified by external classification
>>>engines to
>>> help define its policies.
>>>
>>> Overview of Eagle
>>> Eagle has 3 main parts.
>>> 1.Data collection and storage - Eagle collects data from various hadoop
>>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>>storage.
>>> 2.Data processing and policy engine - Eagle allows users to create
>>> policies based on various metadata properties on HDFS, Hive and HBase
>>>data.
>>> 3.Eagle services - Eagle services include policy manager, query service
>>> and the visualization component. Eagle provides intuitive user
>>>interface to
>>> administer Eagle and an alert dashboard to respond to real time alerts.
>>>
>>> Data Collection and Storage:
>>> Eagle provides programming API for extending Eagle to integrate any data
>>> source into Eagle policy evaluation framework. For example, Eagle hdfs
>>> audit monitoring collects data from Kafka which is populated from
>>>namenode
>>> log4j appender or from logstash agent. Eagle hive monitoring collects
>>>hive
>>> query logs from running job through YARN API, which is designed to be
>>> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>>> metadata and metrics data, and also supports relational database through
>>> configuration change.
>>>
>>> Data Processing and Policy Engine:
>>> Processing Engine: Eagle provides stream processing API which is an
>>> abstraction of Apache Storm. It can also be extended to other streaming
>>> engines. This abstraction allows developers to assemble data
>>> transformation, filtering, external data join etc. without physically
>>>bound
>>> to a specific streaming platform. Eagle streaming API allows developers
>>>to
>>> easily integrate business logic with Eagle policy engine and internally
>>> Eagle framework compiles business logic execution DAG into program
>>> primitives of underlying stream infrastructure e.g. Apache Storm. For
>>> example, Eagle HDFS monitoring transforms audit log from Namenode to
>>>object
>>> and joins sensitivity metadata, security zone metadata which are
>>>generated
>>> from external programs or configured by user. Eagle hive monitoring
>>>filters
>>> running jobs to get hive query string and parses query string into
>>>object
>>> and then joins sensitivity metadata.
>>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>>> scalable policy engine framework, extensible policy engine framework.
>>> Stream metadata API allows developers to declare event schema including
>>> what 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread P. Taylor Goetz
I should also have some improved bandwidth both now that Kylin is nearing 
graduation and for other reasons. I’ve been bogged down recently, but that’s 
starting to change.

If more mentors are desired, I’d be willing to help in that respect.

-Taylor

> On Oct 20, 2015, at 11:49 AM, Henry Saputra  wrote:
> 
> Hi Ted,
> 
> Since Kylin almost ready to graduate, I have more bandwidth to help with 
> Eagle.
> 
> But, you are right that current proposed mentors for Eagle seemed to
> be very busy with other podlings, so 1 or 2 additional mentors would
> be great.
> 
> The good news is that the team consist some people from Kylin, for
> example Luke, which done great job helping Kylin to understand working
> with Apache way.
> So we have some help from initial committers who have done the rodeo before.
> 
> - Henry
> 
> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning  wrote:
>> I would suggest that Owen O'Malley has not had enough time to be a viable
>> mentor recently and should not be on the list of mentors.
>> 
>> Henry and Julian are good if their schedules permit.  Henry, I know has
>> been mentoring a number of projects lately.
>> 
>> 
>> 
>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
>> wrote:
>> 
>>> Hi Arun,
>>> 
>>> very interesting proposal. I may see some possible interaction with
>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
>>> kind of Change Data Capture), etc.
>>> 
>>> So, I see a different perspective in Eagle, but Eagle could also leverage
>>> Falcon somehow.
>>> 
>>> Regards
>>> JB
>>> 
>>> 
>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>>> 
 Hello Everyone,
 
 My name is Arun Manoharan. Currently a product manager in the Analytics
 platform team at eBay Inc.
 
 I would like to start a discussion on Eagle and its joining the ASF as an
 incubation project.
 
 Eagle is a Monitoring solution for Hadoop to instantly identify access to
 sensitive data, recognize attacks, malicious activities and take actions in
 real time. Eagle supports a wide variety of policies on HDFS data and Hive.
 Eagle also provides machine learning models for detecting anomalous user
 behavior in Hadoop.
 
 The proposal is available on the wiki here:
 https://wiki.apache.org/incubator/EagleProposal
 
 The text of the proposal is also available at the end of this email.
 
 Thanks for your time and help.
 
 Thanks,
 Arun
 
 
 
 Eagle
 
 Abstract
 Eagle is an Open Source Monitoring solution for Hadoop to instantly
 identify access to sensitive data, recognize attacks, malicious activities
 in hadoop and take actions.
 
 Proposal
 Eagle audits access to HDFS files, Hive and HBase tables in real time,
 enforces policies defined on sensitive data access and alerts or blocks
 user’s access to that sensitive data in real time. Eagle also creates user
 profiles based on the typical access behaviour for HDFS and Hive and sends
 alerts when anomalous behaviour is detected. Eagle can also import
 sensitive data information classified by external classification engines to
 help define its policies.
 
 Overview of Eagle
 Eagle has 3 main parts.
 1.Data collection and storage - Eagle collects data from various hadoop
 logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
 2.Data processing and policy engine - Eagle allows users to create
 policies based on various metadata properties on HDFS, Hive and HBase data.
 3.Eagle services - Eagle services include policy manager, query service
 and the visualization component. Eagle provides intuitive user interface to
 administer Eagle and an alert dashboard to respond to real time alerts.
 
 Data Collection and Storage:
 Eagle provides programming API for extending Eagle to integrate any data
 source into Eagle policy evaluation framework. For example, Eagle hdfs
 audit monitoring collects data from Kafka which is populated from namenode
 log4j appender or from logstash agent. Eagle hive monitoring collects hive
 query logs from running job through YARN API, which is designed to be
 scalable and fault-tolerant. Eagle uses HBase as storage for storing
 metadata and metrics data, and also supports relational database through
 configuration change.
 
 Data Processing and Policy Engine:
 Processing Engine: Eagle provides stream processing API which is an
 abstraction of Apache Storm. It can also be extended to other streaming
 engines. This abstraction allows developers to assemble data
 transformation, filtering, external data join etc. without physically bound
 to a specific streaming platform. Eagle streaming API allows developers to
 easily integrate business logic with Eagle policy engine 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Manoharan, Arun
Hi Greg,

Thank you for reviewing the proposal.

Originally we thought Eagle might be trademarked by someone already but I
went thru eBay legal team to get the clearance for the name to be used. We
will look into it again to see if there will be potential problems.

Thanks,
Arun

On 10/20/15, 1:52 AM, "Greg Stein"  wrote:

>Hey there, Arun! ... I have no commentary on the proposal itself, as it
>looks like a great proposal. I would suggest being a bit wary of the name,
>as "Eagle" is a *very* popular PCB design program.
>
>On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
>wrote:
>
>> Hello Everyone,
>>
>> My name is Arun Manoharan. Currently a product manager in the Analytics
>> platform team at eBay Inc.
>>
>> I would like to start a discussion on Eagle and its joining the ASF as
>>an
>> incubation project.
>>
>> Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to
>> sensitive data, recognize attacks, malicious activities and take
>>actions in
>> real time. Eagle supports a wide variety of policies on HDFS data and
>>Hive.
>> Eagle also provides machine learning models for detecting anomalous user
>> behavior in Hadoop.
>>
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>>
>> The text of the proposal is also available at the end of this email.
>>
>> Thanks for your time and help.
>>
>> Thanks,
>> Arun
>>
>> 
>>
>> Eagle
>>
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> identify access to sensitive data, recognize attacks, malicious
>>activities
>> in hadoop and take actions.
>>
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> enforces policies defined on sensitive data access and alerts or blocks
>> user¹s access to that sensitive data in real time. Eagle also creates
>>user
>> profiles based on the typical access behaviour for HDFS and Hive and
>>sends
>> alerts when anomalous behaviour is detected. Eagle can also import
>> sensitive data information classified by external classification
>>engines to
>> help define its policies.
>>
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>> policies based on various metadata properties on HDFS, Hive and HBase
>>data.
>> 3.Eagle services - Eagle services include policy manager, query service
>> and the visualization component. Eagle provides intuitive user
>>interface to
>> administer Eagle and an alert dashboard to respond to real time alerts.
>>
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any data
>> source into Eagle policy evaluation framework. For example, Eagle hdfs
>> audit monitoring collects data from Kafka which is populated from
>>namenode
>> log4j appender or from logstash agent. Eagle hive monitoring collects
>>hive
>> query logs from running job through YARN API, which is designed to be
>> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> metadata and metrics data, and also supports relational database through
>> configuration change.
>>
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>> abstraction of Apache Storm. It can also be extended to other streaming
>> engines. This abstraction allows developers to assemble data
>> transformation, filtering, external data join etc. without physically
>>bound
>> to a specific streaming platform. Eagle streaming API allows developers
>>to
>> easily integrate business logic with Eagle policy engine and internally
>> Eagle framework compiles business logic execution DAG into program
>> primitives of underlying stream infrastructure e.g. Apache Storm. For
>> example, Eagle HDFS monitoring transforms audit log from Namenode to
>>object
>> and joins sensitivity metadata, security zone metadata which are
>>generated
>> from external programs or configured by user. Eagle hive monitoring
>>filters
>> running jobs to get hive query string and parses query string into
>>object
>> and then joins sensitivity metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> scalable policy engine framework, extensible policy engine framework.
>> Stream metadata API allows developers to declare event schema including
>> what attributes constitute an event, what is the type for each
>>attribute,
>> and how to dynamically resolve attribute value in runtime when user
>> configures policy. Scalable policy engine framework allows policies to
>>be
>> executed on different physical nodes in parallel. It is also used to
>>define
>> your own policy partitioner class. Policy engine framework together with
>> streaming partitioning capability provided by all 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Alex Karasulu
Hi Arun,

Eagle sounds very promising. I just had a discussion with someone about
this exact need. I do however agree with Greg on the name. As far as I can
see, besides the name, your weakest point is the all eBay employed team.
It's not a blocker and can be fixed during incubation. Good luck to you.

Alex


On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun 
wrote:

> Hi Greg,
>
> Thank you for reviewing the proposal.
>
> Originally we thought Eagle might be trademarked by someone already but I
> went thru eBay legal team to get the clearance for the name to be used. We
> will look into it again to see if there will be potential problems.
>
> Thanks,
> Arun
>
> On 10/20/15, 1:52 AM, "Greg Stein"  wrote:
>
> >Hey there, Arun! ... I have no commentary on the proposal itself, as it
> >looks like a great proposal. I would suggest being a bit wary of the name,
> >as "Eagle" is a *very* popular PCB design program.
> >
> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
> >wrote:
> >
> >> Hello Everyone,
> >>
> >> My name is Arun Manoharan. Currently a product manager in the Analytics
> >> platform team at eBay Inc.
> >>
> >> I would like to start a discussion on Eagle and its joining the ASF as
> >>an
> >> incubation project.
> >>
> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> >>to
> >> sensitive data, recognize attacks, malicious activities and take
> >>actions in
> >> real time. Eagle supports a wide variety of policies on HDFS data and
> >>Hive.
> >> Eagle also provides machine learning models for detecting anomalous user
> >> behavior in Hadoop.
> >>
> >> The proposal is available on the wiki here:
> >> https://wiki.apache.org/incubator/EagleProposal
> >>
> >> The text of the proposal is also available at the end of this email.
> >>
> >> Thanks for your time and help.
> >>
> >> Thanks,
> >> Arun
> >>
> >> 
> >>
> >> Eagle
> >>
> >> Abstract
> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >> identify access to sensitive data, recognize attacks, malicious
> >>activities
> >> in hadoop and take actions.
> >>
> >> Proposal
> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> >> enforces policies defined on sensitive data access and alerts or blocks
> >> user¹s access to that sensitive data in real time. Eagle also creates
> >>user
> >> profiles based on the typical access behaviour for HDFS and Hive and
> >>sends
> >> alerts when anomalous behaviour is detected. Eagle can also import
> >> sensitive data information classified by external classification
> >>engines to
> >> help define its policies.
> >>
> >> Overview of Eagle
> >> Eagle has 3 main parts.
> >> 1.Data collection and storage - Eagle collects data from various hadoop
> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> >>storage.
> >> 2.Data processing and policy engine - Eagle allows users to create
> >> policies based on various metadata properties on HDFS, Hive and HBase
> >>data.
> >> 3.Eagle services - Eagle services include policy manager, query service
> >> and the visualization component. Eagle provides intuitive user
> >>interface to
> >> administer Eagle and an alert dashboard to respond to real time alerts.
> >>
> >> Data Collection and Storage:
> >> Eagle provides programming API for extending Eagle to integrate any data
> >> source into Eagle policy evaluation framework. For example, Eagle hdfs
> >> audit monitoring collects data from Kafka which is populated from
> >>namenode
> >> log4j appender or from logstash agent. Eagle hive monitoring collects
> >>hive
> >> query logs from running job through YARN API, which is designed to be
> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> >> metadata and metrics data, and also supports relational database through
> >> configuration change.
> >>
> >> Data Processing and Policy Engine:
> >> Processing Engine: Eagle provides stream processing API which is an
> >> abstraction of Apache Storm. It can also be extended to other streaming
> >> engines. This abstraction allows developers to assemble data
> >> transformation, filtering, external data join etc. without physically
> >>bound
> >> to a specific streaming platform. Eagle streaming API allows developers
> >>to
> >> easily integrate business logic with Eagle policy engine and internally
> >> Eagle framework compiles business logic execution DAG into program
> >> primitives of underlying stream infrastructure e.g. Apache Storm. For
> >> example, Eagle HDFS monitoring transforms audit log from Namenode to
> >>object
> >> and joins sensitivity metadata, security zone metadata which are
> >>generated
> >> from external programs or configured by user. Eagle hive monitoring
> >>filters
> >> running jobs to get hive query string and parses query string into
> >>object
> >> and then joins sensitivity metadata.
> >> Alerting Framework: Eagle Alert Framework 

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Manoharan, Arun
Thanks Taylor. I will add you to the mentor list.

On 10/20/15, 11:58 AM, "P. Taylor Goetz"  wrote:

>I should also have some improved bandwidth both now that Kylin is nearing
>graduation and for other reasons. I¹ve been bogged down recently, but
>that¹s starting to change.
>
>If more mentors are desired, I¹d be willing to help in that respect.
>
>-Taylor
>
>> On Oct 20, 2015, at 11:49 AM, Henry Saputra 
>>wrote:
>> 
>> Hi Ted,
>> 
>> Since Kylin almost ready to graduate, I have more bandwidth to help
>>with Eagle.
>> 
>> But, you are right that current proposed mentors for Eagle seemed to
>> be very busy with other podlings, so 1 or 2 additional mentors would
>> be great.
>> 
>> The good news is that the team consist some people from Kylin, for
>> example Luke, which done great job helping Kylin to understand working
>> with Apache way.
>> So we have some help from initial committers who have done the rodeo
>>before.
>> 
>> - Henry
>> 
>> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning 
>>wrote:
>>> I would suggest that Owen O'Malley has not had enough time to be a
>>>viable
>>> mentor recently and should not be on the list of mentors.
>>> 
>>> Henry and Julian are good if their schedules permit.  Henry, I know has
>>> been mentoring a number of projects lately.
>>> 
>>> 
>>> 
>>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
 Hi Arun,
 
 very interesting proposal. I may see some possible interaction with
 Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring
(with a
 kind of Change Data Capture), etc.
 
 So, I see a different perspective in Eagle, but Eagle could also
leverage
 Falcon somehow.
 
 Regards
 JB
 
 
 On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
 
> Hello Everyone,
> 
> My name is Arun Manoharan. Currently a product manager in the
>Analytics
> platform team at eBay Inc.
> 
> I would like to start a discussion on Eagle and its joining the ASF
>as an
> incubation project.
> 
> Eagle is a Monitoring solution for Hadoop to instantly identify
>access to
> sensitive data, recognize attacks, malicious activities and take
>actions in
> real time. Eagle supports a wide variety of policies on HDFS data
>and Hive.
> Eagle also provides machine learning models for detecting anomalous
>user
> behavior in Hadoop.
> 
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
> 
> The text of the proposal is also available at the end of this email.
> 
> Thanks for your time and help.
> 
> Thanks,
> Arun
> 
> 
> 
> Eagle
> 
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious
>activities
> in hadoop and take actions.
> 
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real
>time,
> enforces policies defined on sensitive data access and alerts or
>blocks
> user¹s access to that sensitive data in real time. Eagle also
>creates user
> profiles based on the typical access behaviour for HDFS and Hive and
>sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification
>engines to
> help define its policies.
> 
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various
>hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and
>HBase data.
> 3.Eagle services - Eagle services include policy manager, query
>service
> and the visualization component. Eagle provides intuitive user
>interface to
> administer Eagle and an alert dashboard to respond to real time
>alerts.
> 
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any
>data
> source into Eagle policy evaluation framework. For example, Eagle
>hdfs
> audit monitoring collects data from Kafka which is populated from
>namenode
> log4j appender or from logstash agent. Eagle hive monitoring
>collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database
>through
> configuration change.
> 
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an

Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Henry Saputra
Hi Ted,

Thanks for your concern, but we have had discussions with all proposed
mentors before to ask for their availability and willingness to
actively mentor this project.

I think we are good with existing proposed mentors.


- Henry

On Tue, Oct 20, 2015 at 9:10 PM, Ted Dunning  wrote:
> On Tue, Oct 20, 2015 at 4:14 PM, Manoharan, Arun 
> wrote:
>
>> Thanks Taylor. I will add you to the mentor list.
>>
>
>
> Arun,
>
> Can you also do a scrub of the mentor list by asking each of the mentors
> whether they have been able to support other groups that they are
> mentoring. If they don't answer, or if they can't say that they have been
> supportive (at least to the extent of signing off project reports), then
> please remove them from your list.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Ted Dunning
On Tue, Oct 20, 2015 at 4:14 PM, Manoharan, Arun 
wrote:

> Thanks Taylor. I will add you to the mentor list.
>


Arun,

Can you also do a scrub of the mentor list by asking each of the mentors
whether they have been able to support other groups that they are
mentoring. If they don't answer, or if they can't say that they have been
supportive (at least to the extent of signing off project reports), then
please remove them from your list.


Re: [DISCUSS] Eagle incubator proposal

2015-10-20 Thread Don Bosco Durai
Hi Arun

This looks really good and fills some obvious gaps in the security landscape.

Happy to contribute anyway you want.

All the best!!!

Bosco





On 10/20/15, 8:02 AM, "Alex Karasulu"  wrote:

>Hi Arun,
>
>Eagle sounds very promising. I just had a discussion with someone about
>this exact need. I do however agree with Greg on the name. As far as I can
>see, besides the name, your weakest point is the all eBay employed team.
>It's not a blocker and can be fixed during incubation. Good luck to you.
>
>Alex
>
>
>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun 
>wrote:
>
>> Hi Greg,
>>
>> Thank you for reviewing the proposal.
>>
>> Originally we thought Eagle might be trademarked by someone already but I
>> went thru eBay legal team to get the clearance for the name to be used. We
>> will look into it again to see if there will be potential problems.
>>
>> Thanks,
>> Arun
>>
>> On 10/20/15, 1:52 AM, "Greg Stein"  wrote:
>>
>> >Hey there, Arun! ... I have no commentary on the proposal itself, as it
>> >looks like a great proposal. I would suggest being a bit wary of the name,
>> >as "Eagle" is a *very* popular PCB design program.
>> >
>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun 
>> >wrote:
>> >
>> >> Hello Everyone,
>> >>
>> >> My name is Arun Manoharan. Currently a product manager in the Analytics
>> >> platform team at eBay Inc.
>> >>
>> >> I would like to start a discussion on Eagle and its joining the ASF as
>> >>an
>> >> incubation project.
>> >>
>> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
>> >>to
>> >> sensitive data, recognize attacks, malicious activities and take
>> >>actions in
>> >> real time. Eagle supports a wide variety of policies on HDFS data and
>> >>Hive.
>> >> Eagle also provides machine learning models for detecting anomalous user
>> >> behavior in Hadoop.
>> >>
>> >> The proposal is available on the wiki here:
>> >> https://wiki.apache.org/incubator/EagleProposal
>> >>
>> >> The text of the proposal is also available at the end of this email.
>> >>
>> >> Thanks for your time and help.
>> >>
>> >> Thanks,
>> >> Arun
>> >>
>> >> 
>> >>
>> >> Eagle
>> >>
>> >> Abstract
>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> >> identify access to sensitive data, recognize attacks, malicious
>> >>activities
>> >> in hadoop and take actions.
>> >>
>> >> Proposal
>> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> >> enforces policies defined on sensitive data access and alerts or blocks
>> >> user¹s access to that sensitive data in real time. Eagle also creates
>> >>user
>> >> profiles based on the typical access behaviour for HDFS and Hive and
>> >>sends
>> >> alerts when anomalous behaviour is detected. Eagle can also import
>> >> sensitive data information classified by external classification
>> >>engines to
>> >> help define its policies.
>> >>
>> >> Overview of Eagle
>> >> Eagle has 3 main parts.
>> >> 1.Data collection and storage - Eagle collects data from various hadoop
>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>> >>storage.
>> >> 2.Data processing and policy engine - Eagle allows users to create
>> >> policies based on various metadata properties on HDFS, Hive and HBase
>> >>data.
>> >> 3.Eagle services - Eagle services include policy manager, query service
>> >> and the visualization component. Eagle provides intuitive user
>> >>interface to
>> >> administer Eagle and an alert dashboard to respond to real time alerts.
>> >>
>> >> Data Collection and Storage:
>> >> Eagle provides programming API for extending Eagle to integrate any data
>> >> source into Eagle policy evaluation framework. For example, Eagle hdfs
>> >> audit monitoring collects data from Kafka which is populated from
>> >>namenode
>> >> log4j appender or from logstash agent. Eagle hive monitoring collects
>> >>hive
>> >> query logs from running job through YARN API, which is designed to be
>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> >> metadata and metrics data, and also supports relational database through
>> >> configuration change.
>> >>
>> >> Data Processing and Policy Engine:
>> >> Processing Engine: Eagle provides stream processing API which is an
>> >> abstraction of Apache Storm. It can also be extended to other streaming
>> >> engines. This abstraction allows developers to assemble data
>> >> transformation, filtering, external data join etc. without physically
>> >>bound
>> >> to a specific streaming platform. Eagle streaming API allows developers
>> >>to
>> >> easily integrate business logic with Eagle policy engine and internally
>> >> Eagle framework compiles business logic execution DAG into program
>> >> primitives of underlying stream infrastructure e.g. Apache Storm. For
>> >> example, Eagle HDFS monitoring transforms audit 

[DISCUSS] Eagle incubator proposal

2015-10-19 Thread Manoharan, Arun
Hello Everyone,

My name is Arun Manoharan. Currently a product manager in the Analytics 
platform team at eBay Inc.

I would like to start a discussion on Eagle and its joining the ASF as an 
incubation project.

Eagle is a Monitoring solution for Hadoop to instantly identify access to 
sensitive data, recognize attacks, malicious activities and take actions in 
real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
Eagle also provides machine learning models for detecting anomalous user 
behavior in Hadoop.

The proposal is available on the wiki here:
https://wiki.apache.org/incubator/EagleProposal

The text of the proposal is also available at the end of this email.

Thanks for your time and help.

Thanks,
Arun



Eagle

Abstract
Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
access to sensitive data, recognize attacks, malicious activities in hadoop and 
take actions.

Proposal
Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces 
policies defined on sensitive data access and alerts or blocks user’s access to 
that sensitive data in real time. Eagle also creates user profiles based on the 
typical access behaviour for HDFS and Hive and sends alerts when anomalous 
behaviour is detected. Eagle can also import sensitive data information 
classified by external classification engines to help define its policies.

Overview of Eagle
Eagle has 3 main parts.
1.Data collection and storage - Eagle collects data from various hadoop logs in 
real time using Kafka/Yarn API and uses HDFS and HBase for storage.
2.Data processing and policy engine - Eagle allows users to create policies 
based on various metadata properties on HDFS, Hive and HBase data.
3.Eagle services - Eagle services include policy manager, query service and the 
visualization component. Eagle provides intuitive user interface to administer 
Eagle and an alert dashboard to respond to real time alerts.

Data Collection and Storage:
Eagle provides programming API for extending Eagle to integrate any data source 
into Eagle policy evaluation framework. For example, Eagle hdfs audit 
monitoring collects data from Kafka which is populated from namenode log4j 
appender or from logstash agent. Eagle hive monitoring collects hive query logs 
from running job through YARN API, which is designed to be scalable and 
fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
data, and also supports relational database through configuration change.

Data Processing and Policy Engine:
Processing Engine: Eagle provides stream processing API which is an abstraction 
of Apache Storm. It can also be extended to other streaming engines. This 
abstraction allows developers to assemble data transformation, filtering, 
external data join etc. without physically bound to a specific streaming 
platform. Eagle streaming API allows developers to easily integrate business 
logic with Eagle policy engine and internally Eagle framework compiles business 
logic execution DAG into program primitives of underlying stream infrastructure 
e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from 
Namenode to object and joins sensitivity metadata, security zone metadata which 
are generated from external programs or configured by user. Eagle hive 
monitoring filters running jobs to get hive query string and parses query 
string into object and then joins sensitivity metadata.
Alerting Framework: Eagle Alert Framework includes stream metadata API, 
scalable policy engine framework, extensible policy engine framework. Stream 
metadata API allows developers to declare event schema including what 
attributes constitute an event, what is the type for each attribute, and how to 
dynamically resolve attribute value in runtime when user configures policy. 
Scalable policy engine framework allows policies to be executed on different 
physical nodes in parallel. It is also used to define your own policy 
partitioner class. Policy engine framework together with streaming partitioning 
capability provided by all streaming platforms will make sure policies and 
events can be evaluated in a fully distributed way. Extensible policy engine 
framework allows developer to plugin a new policy engine with a few lines of 
codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports as 
first-class citizen.
Machine Learning module: Eagle provides capabilities to define user activity 
patterns or user profiles for Hadoop users based on the user behaviour in the 
platform. These user profiles are modeled using Machine Learning algorithms and 
used for detection of anomalous users activities. Eagle uses Eigen Value 
Decomposition, and Density Estimation algorithms for generating user profile 
models. The model reads data from HDFS audit logs, preprocesses and aggregates 
data, and generates models using Spark programming APIs. Once models are 
generated, Eagle uses stream 

Re: [DISCUSS] Eagle incubator proposal

2015-10-19 Thread Jean-Baptiste Onofré

Hi Arun,

very interesting proposal. I may see some possible interaction with 
Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with 
a kind of Change Data Capture), etc.


So, I see a different perspective in Eagle, but Eagle could also 
leverage Falcon somehow.


Regards
JB

On 10/19/2015 05:33 PM, Manoharan, Arun wrote:

Hello Everyone,

My name is Arun Manoharan. Currently a product manager in the Analytics 
platform team at eBay Inc.

I would like to start a discussion on Eagle and its joining the ASF as an 
incubation project.

Eagle is a Monitoring solution for Hadoop to instantly identify access to 
sensitive data, recognize attacks, malicious activities and take actions in 
real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
Eagle also provides machine learning models for detecting anomalous user 
behavior in Hadoop.

The proposal is available on the wiki here:
https://wiki.apache.org/incubator/EagleProposal

The text of the proposal is also available at the end of this email.

Thanks for your time and help.

Thanks,
Arun



Eagle

Abstract
Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
access to sensitive data, recognize attacks, malicious activities in hadoop and 
take actions.

Proposal
Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces 
policies defined on sensitive data access and alerts or blocks user’s access to 
that sensitive data in real time. Eagle also creates user profiles based on the 
typical access behaviour for HDFS and Hive and sends alerts when anomalous 
behaviour is detected. Eagle can also import sensitive data information 
classified by external classification engines to help define its policies.

Overview of Eagle
Eagle has 3 main parts.
1.Data collection and storage - Eagle collects data from various hadoop logs in 
real time using Kafka/Yarn API and uses HDFS and HBase for storage.
2.Data processing and policy engine - Eagle allows users to create policies 
based on various metadata properties on HDFS, Hive and HBase data.
3.Eagle services - Eagle services include policy manager, query service and the 
visualization component. Eagle provides intuitive user interface to administer 
Eagle and an alert dashboard to respond to real time alerts.

Data Collection and Storage:
Eagle provides programming API for extending Eagle to integrate any data source 
into Eagle policy evaluation framework. For example, Eagle hdfs audit 
monitoring collects data from Kafka which is populated from namenode log4j 
appender or from logstash agent. Eagle hive monitoring collects hive query logs 
from running job through YARN API, which is designed to be scalable and 
fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
data, and also supports relational database through configuration change.

Data Processing and Policy Engine:
Processing Engine: Eagle provides stream processing API which is an abstraction 
of Apache Storm. It can also be extended to other streaming engines. This 
abstraction allows developers to assemble data transformation, filtering, 
external data join etc. without physically bound to a specific streaming 
platform. Eagle streaming API allows developers to easily integrate business 
logic with Eagle policy engine and internally Eagle framework compiles business 
logic execution DAG into program primitives of underlying stream infrastructure 
e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from 
Namenode to object and joins sensitivity metadata, security zone metadata which 
are generated from external programs or configured by user. Eagle hive 
monitoring filters running jobs to get hive query string and parses query 
string into object and then joins sensitivity metadata.
Alerting Framework: Eagle Alert Framework includes stream metadata API, 
scalable policy engine framework, extensible policy engine framework. Stream 
metadata API allows developers to declare event schema including what 
attributes constitute an event, what is the type for each attribute, and how to 
dynamically resolve attribute value in runtime when user configures policy. 
Scalable policy engine framework allows policies to be executed on different 
physical nodes in parallel. It is also used to define your own policy 
partitioner class. Policy engine framework together with streaming partitioning 
capability provided by all streaming platforms will make sure policies and 
events can be evaluated in a fully distributed way. Extensible policy engine 
framework allows developer to plugin a new policy engine with a few lines of 
codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports as 
first-class citizen.
Machine Learning module: Eagle provides capabilities to define user activity 
patterns or user profiles for Hadoop users based on the user behaviour in the 
platform. These user profiles are modeled using Machine Learning 

Re: [DISCUSS] Eagle incubator proposal

2015-10-19 Thread Ted Dunning
I would suggest that Owen O'Malley has not had enough time to be a viable
mentor recently and should not be on the list of mentors.

Henry and Julian are good if their schedules permit.  Henry, I know has
been mentoring a number of projects lately.



On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré 
wrote:

> Hi Arun,
>
> very interesting proposal. I may see some possible interaction with
> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a
> kind of Change Data Capture), etc.
>
> So, I see a different perspective in Eagle, but Eagle could also leverage
> Falcon somehow.
>
> Regards
> JB
>
>
> On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>
>> Hello Everyone,
>>
>> My name is Arun Manoharan. Currently a product manager in the Analytics
>> platform team at eBay Inc.
>>
>> I would like to start a discussion on Eagle and its joining the ASF as an
>> incubation project.
>>
>> Eagle is a Monitoring solution for Hadoop to instantly identify access to
>> sensitive data, recognize attacks, malicious activities and take actions in
>> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
>> Eagle also provides machine learning models for detecting anomalous user
>> behavior in Hadoop.
>>
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>>
>> The text of the proposal is also available at the end of this email.
>>
>> Thanks for your time and help.
>>
>> Thanks,
>> Arun
>>
>> 
>>
>> Eagle
>>
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> identify access to sensitive data, recognize attacks, malicious activities
>> in hadoop and take actions.
>>
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> enforces policies defined on sensitive data access and alerts or blocks
>> user’s access to that sensitive data in real time. Eagle also creates user
>> profiles based on the typical access behaviour for HDFS and Hive and sends
>> alerts when anomalous behaviour is detected. Eagle can also import
>> sensitive data information classified by external classification engines to
>> help define its policies.
>>
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>> policies based on various metadata properties on HDFS, Hive and HBase data.
>> 3.Eagle services - Eagle services include policy manager, query service
>> and the visualization component. Eagle provides intuitive user interface to
>> administer Eagle and an alert dashboard to respond to real time alerts.
>>
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any data
>> source into Eagle policy evaluation framework. For example, Eagle hdfs
>> audit monitoring collects data from Kafka which is populated from namenode
>> log4j appender or from logstash agent. Eagle hive monitoring collects hive
>> query logs from running job through YARN API, which is designed to be
>> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> metadata and metrics data, and also supports relational database through
>> configuration change.
>>
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>> abstraction of Apache Storm. It can also be extended to other streaming
>> engines. This abstraction allows developers to assemble data
>> transformation, filtering, external data join etc. without physically bound
>> to a specific streaming platform. Eagle streaming API allows developers to
>> easily integrate business logic with Eagle policy engine and internally
>> Eagle framework compiles business logic execution DAG into program
>> primitives of underlying stream infrastructure e.g. Apache Storm. For
>> example, Eagle HDFS monitoring transforms audit log from Namenode to object
>> and joins sensitivity metadata, security zone metadata which are generated
>> from external programs or configured by user. Eagle hive monitoring filters
>> running jobs to get hive query string and parses query string into object
>> and then joins sensitivity metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> scalable policy engine framework, extensible policy engine framework.
>> Stream metadata API allows developers to declare event schema including
>> what attributes constitute an event, what is the type for each attribute,
>> and how to dynamically resolve attribute value in runtime when user
>> configures policy. Scalable policy engine framework allows policies to be
>> executed on different physical nodes in parallel. It is also used to define
>> your own policy partitioner class. Policy engine framework together with
>> streaming partitioning 

Re: [DISCUSS] Eagle incubator proposal

2015-10-19 Thread Zhang, Edward (GDI Hadoop)
Hi JB,

That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase data
changes, so this feature would complement Eagle which today mainly focuses
on HDFS/Hive/HBase data access including view, change, delete etc. Eagle
would benefit if Eagle can instantly capture data change from Falcon.

Thanks
Edward Zhang



On 10/19/15, 8:40, "Jean-Baptiste Onofré"  wrote:

>Hi Arun,
>
>very interesting proposal. I may see some possible interaction with
>Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with
>a kind of Change Data Capture), etc.
>
>So, I see a different perspective in Eagle, but Eagle could also
>leverage Falcon somehow.
>
>Regards
>JB
>
>On 10/19/2015 05:33 PM, Manoharan, Arun wrote:
>> Hello Everyone,
>>
>> My name is Arun Manoharan. Currently a product manager in the Analytics
>>platform team at eBay Inc.
>>
>> I would like to start a discussion on Eagle and its joining the ASF as
>>an incubation project.
>>
>> Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to sensitive data, recognize attacks, malicious activities and take
>>actions in real time. Eagle supports a wide variety of policies on HDFS
>>data and Hive. Eagle also provides machine learning models for detecting
>>anomalous user behavior in Hadoop.
>>
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>>
>> The text of the proposal is also available at the end of this email.
>>
>> Thanks for your time and help.
>>
>> Thanks,
>> Arun
>>
>> 
>>
>> Eagle
>>
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>identify access to sensitive data, recognize attacks, malicious
>>activities in hadoop and take actions.
>>
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>enforces policies defined on sensitive data access and alerts or blocks
>>user¹s access to that sensitive data in real time. Eagle also creates
>>user profiles based on the typical access behaviour for HDFS and Hive
>>and sends alerts when anomalous behaviour is detected. Eagle can also
>>import sensitive data information classified by external classification
>>engines to help define its policies.
>>
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>>policies based on various metadata properties on HDFS, Hive and HBase
>>data.
>> 3.Eagle services - Eagle services include policy manager, query service
>>and the visualization component. Eagle provides intuitive user interface
>>to administer Eagle and an alert dashboard to respond to real time
>>alerts.
>>
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any
>>data source into Eagle policy evaluation framework. For example, Eagle
>>hdfs audit monitoring collects data from Kafka which is populated from
>>namenode log4j appender or from logstash agent. Eagle hive monitoring
>>collects hive query logs from running job through YARN API, which is
>>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>>for storing metadata and metrics data, and also supports relational
>>database through configuration change.
>>
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>>abstraction of Apache Storm. It can also be extended to other streaming
>>engines. This abstraction allows developers to assemble data
>>transformation, filtering, external data join etc. without physically
>>bound to a specific streaming platform. Eagle streaming API allows
>>developers to easily integrate business logic with Eagle policy engine
>>and internally Eagle framework compiles business logic execution DAG
>>into program primitives of underlying stream infrastructure e.g. Apache
>>Storm. For example, Eagle HDFS monitoring transforms audit log from
>>Namenode to object and joins sensitivity metadata, security zone
>>metadata which are generated from external programs or configured by
>>user. Eagle hive monitoring filters running jobs to get hive query
>>string and parses query string into object and then joins sensitivity
>>metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>>scalable policy engine framework, extensible policy engine framework.
>>Stream metadata API allows developers to declare event schema including
>>what attributes constitute an event, what is the type for each
>>attribute, and how to dynamically resolve attribute value in runtime
>>when user configures policy. Scalable policy engine framework allows
>>policies to be executed on different physical nodes in parallel. It is
>>also used to define your own policy partitioner class. Policy engine
>>framework together with streaming 

Re: [DISCUSS] Eagle incubator proposal

2015-10-19 Thread Jean-Baptiste Onofré

It makes sense. I will try to contribute on this ;)

Regards
JB

On 10/19/2015 09:46 PM, Zhang, Edward (GDI Hadoop) wrote:

Hi JB,

That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase data
changes, so this feature would complement Eagle which today mainly focuses
on HDFS/Hive/HBase data access including view, change, delete etc. Eagle
would benefit if Eagle can instantly capture data change from Falcon.

Thanks
Edward Zhang



On 10/19/15, 8:40, "Jean-Baptiste Onofré"  wrote:


Hi Arun,

very interesting proposal. I may see some possible interaction with
Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with
a kind of Change Data Capture), etc.

So, I see a different perspective in Eagle, but Eagle could also
leverage Falcon somehow.

Regards
JB

On 10/19/2015 05:33 PM, Manoharan, Arun wrote:

Hello Everyone,

My name is Arun Manoharan. Currently a product manager in the Analytics
platform team at eBay Inc.

I would like to start a discussion on Eagle and its joining the ASF as
an incubation project.

Eagle is a Monitoring solution for Hadoop to instantly identify access
to sensitive data, recognize attacks, malicious activities and take
actions in real time. Eagle supports a wide variety of policies on HDFS
data and Hive. Eagle also provides machine learning models for detecting
anomalous user behavior in Hadoop.

The proposal is available on the wiki here:
https://wiki.apache.org/incubator/EagleProposal

The text of the proposal is also available at the end of this email.

Thanks for your time and help.

Thanks,
Arun



Eagle

Abstract
Eagle is an Open Source Monitoring solution for Hadoop to instantly
identify access to sensitive data, recognize attacks, malicious
activities in hadoop and take actions.

Proposal
Eagle audits access to HDFS files, Hive and HBase tables in real time,
enforces policies defined on sensitive data access and alerts or blocks
user¹s access to that sensitive data in real time. Eagle also creates
user profiles based on the typical access behaviour for HDFS and Hive
and sends alerts when anomalous behaviour is detected. Eagle can also
import sensitive data information classified by external classification
engines to help define its policies.

Overview of Eagle
Eagle has 3 main parts.
1.Data collection and storage - Eagle collects data from various hadoop
logs in real time using Kafka/Yarn API and uses HDFS and HBase for
storage.
2.Data processing and policy engine - Eagle allows users to create
policies based on various metadata properties on HDFS, Hive and HBase
data.
3.Eagle services - Eagle services include policy manager, query service
and the visualization component. Eagle provides intuitive user interface
to administer Eagle and an alert dashboard to respond to real time
alerts.

Data Collection and Storage:
Eagle provides programming API for extending Eagle to integrate any
data source into Eagle policy evaluation framework. For example, Eagle
hdfs audit monitoring collects data from Kafka which is populated from
namenode log4j appender or from logstash agent. Eagle hive monitoring
collects hive query logs from running job through YARN API, which is
designed to be scalable and fault-tolerant. Eagle uses HBase as storage
for storing metadata and metrics data, and also supports relational
database through configuration change.

Data Processing and Policy Engine:
Processing Engine: Eagle provides stream processing API which is an
abstraction of Apache Storm. It can also be extended to other streaming
engines. This abstraction allows developers to assemble data
transformation, filtering, external data join etc. without physically
bound to a specific streaming platform. Eagle streaming API allows
developers to easily integrate business logic with Eagle policy engine
and internally Eagle framework compiles business logic execution DAG
into program primitives of underlying stream infrastructure e.g. Apache
Storm. For example, Eagle HDFS monitoring transforms audit log from
Namenode to object and joins sensitivity metadata, security zone
metadata which are generated from external programs or configured by
user. Eagle hive monitoring filters running jobs to get hive query
string and parses query string into object and then joins sensitivity
metadata.
Alerting Framework: Eagle Alert Framework includes stream metadata API,
scalable policy engine framework, extensible policy engine framework.
Stream metadata API allows developers to declare event schema including
what attributes constitute an event, what is the type for each
attribute, and how to dynamically resolve attribute value in runtime
when user configures policy. Scalable policy engine framework allows
policies to be executed on different physical nodes in parallel. It is
also used to define your own policy partitioner class. Policy engine
framework together with streaming partitioning capability provided by
all streaming platforms will make sure policies and events can be