Re: [DISCUSS] Eagle incubator proposal
Thanks Henry, I'm also happy to help Eagle for incubating, the experience we have learnt during Kylin's incubating will share to Eagle team and community. Sign off reports and vote for release is not only things mentors will help, for the process, setup, policy, infrastructure, guidance, workflow, the Apache Way and so on...a lots of things require mentor's efforts, several active mentors are really important for new podling committers to learn and practices in their daily work, We have got help from our mentors very much, hope such experience from Kylin will also benefits Eagle even other project. Thanks. Best Regards! - Luke Han On Thu, Oct 22, 2015 at 1:16 PM, Greg Steinwrote: > On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley > wrote: > > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning > > wrote: > > > > > I would suggest that Owen O'Malley has not had enough time to be a > viable > > > mentor recently and should not be on the list of mentors. > > > > > > > I have been helping Kylin out and it is graduating, so I'm down to just > > Hawq. I'd like to help Eagle out. > > > > No need to be on the official list of mentors ... just help out anyways ... > heck, then you don't have to be responsible for signing off reports ;-) >
Re: [DISCUSS] Eagle incubator proposal
Arun - This is a very interesting proposal and I can imagine at least a couple ways that it and Apache Knox could work together. I would like to be a contributor to Eagle as well - if you are interested. I am a committer and PMC member on Knox and Ranger and have contributed to a number of ecosystem projects on security issues. Good luck! --larry On Thu, Oct 22, 2015 at 4:27 AM, Luke Hanwrote: > Thanks Henry, I'm also happy to help Eagle for incubating, the experience > we have > learnt during Kylin's incubating will share to Eagle team and community. > > Sign off reports and vote for release is not only things mentors will help, > for the process, setup, policy, infrastructure, guidance, workflow, the > Apache Way and > so on...a lots of things require mentor's efforts, several active mentors > are really important > for new podling committers to learn and practices in their daily work, > > We have got help from our mentors very much, hope such experience from > Kylin will also > benefits Eagle even other project. > > > Thanks. > > > > > > Best Regards! > - > > Luke Han > > On Thu, Oct 22, 2015 at 1:16 PM, Greg Stein wrote: > > > On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley > > wrote: > > > > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning > > > wrote: > > > > > > > I would suggest that Owen O'Malley has not had enough time to be a > > viable > > > > mentor recently and should not be on the list of mentors. > > > > > > > > > > I have been helping Kylin out and it is graduating, so I'm down to just > > > Hawq. I'd like to help Eagle out. > > > > > > > No need to be on the official list of mentors ... just help out anyways > ... > > heck, then you don't have to be responsible for signing off reports ;-) > > >
Re: [DISCUSS] Eagle incubator proposal
Thanks Larry. We have some ideas for integration with Ranger and would love to get your inputs. On 10/22/15, 4:55 AM, "larry mccay"wrote: >Arun - > >This is a very interesting proposal and I can imagine at least a couple >ways that it and Apache Knox could work together. >I would like to be a contributor to Eagle as well - if you are interested. > >I am a committer and PMC member on Knox and Ranger and have contributed to >a number of ecosystem projects on security issues. > >Good luck! > >--larry > >On Thu, Oct 22, 2015 at 4:27 AM, Luke Han wrote: > >> Thanks Henry, I'm also happy to help Eagle for incubating, the >>experience >> we have >> learnt during Kylin's incubating will share to Eagle team and community. >> >> Sign off reports and vote for release is not only things mentors will >>help, >> for the process, setup, policy, infrastructure, guidance, workflow, the >> Apache Way and >> so on...a lots of things require mentor's efforts, several active >>mentors >> are really important >> for new podling committers to learn and practices in their daily work, >> >> We have got help from our mentors very much, hope such experience from >> Kylin will also >> benefits Eagle even other project. >> >> >> Thanks. >> >> >> >> >> >> Best Regards! >> - >> >> Luke Han >> >> On Thu, Oct 22, 2015 at 1:16 PM, Greg Stein wrote: >> >> > On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malley >> > wrote: >> > >> > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning >> > > wrote: >> > > >> > > > I would suggest that Owen O'Malley has not had enough time to be a >> > viable >> > > > mentor recently and should not be on the list of mentors. >> > > > >> > > >> > > I have been helping Kylin out and it is graduating, so I'm down to >>just >> > > Hawq. I'd like to help Eagle out. >> > > >> > >> > No need to be on the official list of mentors ... just help out >>anyways >> ... >> > heck, then you don't have to be responsible for signing off reports >>;-) >> > >> - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Eagle incubator proposal
Looks like the discussion has calm down, so unless there is more comments we will send VOTE thread tomorrow. Thanks all for the feedback. - Henry On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arunwrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > sensitive data, recognize attacks, malicious activities and take actions in > real time. Eagle supports a wide variety of policies on HDFS data and Hive. > Eagle also provides machine learning models for detecting anomalous user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly identify > access to sensitive data, recognize attacks, malicious activities in hadoop > and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real time, > enforces policies defined on sensitive data access and alerts or blocks > user’s access to that sensitive data in real time. Eagle also creates user > profiles based on the typical access behaviour for HDFS and Hive and sends > alerts when anomalous behaviour is detected. Eagle can also import sensitive > data information classified by external classification engines to help define > its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various hadoop logs > in real time using Kafka/Yarn API and uses HDFS and HBase for storage. > 2.Data processing and policy engine - Eagle allows users to create policies > based on various metadata properties on HDFS, Hive and HBase data. > 3.Eagle services - Eagle services include policy manager, query service and > the visualization component. Eagle provides intuitive user interface to > administer Eagle and an alert dashboard to respond to real time alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any data > source into Eagle policy evaluation framework. For example, Eagle hdfs audit > monitoring collects data from Kafka which is populated from namenode log4j > appender or from logstash agent. Eagle hive monitoring collects hive query > logs from running job through YARN API, which is designed to be scalable and > fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics > data, and also supports relational database through configuration change. > > Data Processing and Policy Engine: > Processing Engine: Eagle provides stream processing API which is an > abstraction of Apache Storm. It can also be extended to other streaming > engines. This abstraction allows developers to assemble data transformation, > filtering, external data join etc. without physically bound to a specific > streaming platform. Eagle streaming API allows developers to easily integrate > business logic with Eagle policy engine and internally Eagle framework > compiles business logic execution DAG into program primitives of underlying > stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring > transforms audit log from Namenode to object and joins sensitivity metadata, > security zone metadata which are generated from external programs or > configured by user. Eagle hive monitoring filters running jobs to get hive > query string and parses query string into object and then joins sensitivity > metadata. > Alerting Framework: Eagle Alert Framework includes stream metadata API, > scalable policy engine framework, extensible policy engine framework. Stream > metadata API allows developers to declare event schema including what > attributes constitute an event, what is the type for each attribute, and how > to dynamically resolve attribute value in runtime when user configures > policy. Scalable policy engine framework allows policies to be executed on > different physical nodes in parallel. It is also used to define your own > policy partitioner class. Policy engine framework together with streaming > partitioning capability provided by all streaming platforms will make sure > policies and events can be evaluated in a fully distributed way. Extensible > policy engine framework allows developer to plugin a new policy engine with a > few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle > supports as first-class citizen. > Machine Learning module: Eagle provides capabilities to define user activity > patterns or user profiles for Hadoop users based on the user behaviour in the > platform. These user
Re: [DISCUSS] Eagle incubator proposal
Looks like the discussion has calm down, so unless there is more comments we will send VOTE thread tomorrow. Thanks all for the feedback. - Henry On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arunwrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > sensitive data, recognize attacks, malicious activities and take actions in > real time. Eagle supports a wide variety of policies on HDFS data and Hive. > Eagle also provides machine learning models for detecting anomalous user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly identify > access to sensitive data, recognize attacks, malicious activities in hadoop > and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real time, > enforces policies defined on sensitive data access and alerts or blocks > user’s access to that sensitive data in real time. Eagle also creates user > profiles based on the typical access behaviour for HDFS and Hive and sends > alerts when anomalous behaviour is detected. Eagle can also import sensitive > data information classified by external classification engines to help define > its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various hadoop logs > in real time using Kafka/Yarn API and uses HDFS and HBase for storage. > 2.Data processing and policy engine - Eagle allows users to create policies > based on various metadata properties on HDFS, Hive and HBase data. > 3.Eagle services - Eagle services include policy manager, query service and > the visualization component. Eagle provides intuitive user interface to > administer Eagle and an alert dashboard to respond to real time alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any data > source into Eagle policy evaluation framework. For example, Eagle hdfs audit > monitoring collects data from Kafka which is populated from namenode log4j > appender or from logstash agent. Eagle hive monitoring collects hive query > logs from running job through YARN API, which is designed to be scalable and > fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics > data, and also supports relational database through configuration change. > > Data Processing and Policy Engine: > Processing Engine: Eagle provides stream processing API which is an > abstraction of Apache Storm. It can also be extended to other streaming > engines. This abstraction allows developers to assemble data transformation, > filtering, external data join etc. without physically bound to a specific > streaming platform. Eagle streaming API allows developers to easily integrate > business logic with Eagle policy engine and internally Eagle framework > compiles business logic execution DAG into program primitives of underlying > stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring > transforms audit log from Namenode to object and joins sensitivity metadata, > security zone metadata which are generated from external programs or > configured by user. Eagle hive monitoring filters running jobs to get hive > query string and parses query string into object and then joins sensitivity > metadata. > Alerting Framework: Eagle Alert Framework includes stream metadata API, > scalable policy engine framework, extensible policy engine framework. Stream > metadata API allows developers to declare event schema including what > attributes constitute an event, what is the type for each attribute, and how > to dynamically resolve attribute value in runtime when user configures > policy. Scalable policy engine framework allows policies to be executed on > different physical nodes in parallel. It is also used to define your own > policy partitioner class. Policy engine framework together with streaming > partitioning capability provided by all streaming platforms will make sure > policies and events can be evaluated in a fully distributed way. Extensible > policy engine framework allows developer to plugin a new policy engine with a > few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle > supports as first-class citizen. > Machine Learning module: Eagle provides capabilities to define user activity > patterns or user profiles for Hadoop users based on the user behaviour in the > platform. These user
Re: [DISCUSS] Eagle incubator proposal
+1 for moving forward with a VOTE. > On Oct 22, 2015, at 7:26 PM, Henry Saputrawrote: > > Looks like the discussion has calm down, so unless there is more > comments we will send VOTE thread tomorrow. > > Thanks all for the feedback. > > - Henry > >> On Mon, Oct 19, 2015 at 8:33 AM, Manoharan, Arun >> wrote: >> Hello Everyone, >> >> My name is Arun Manoharan. Currently a product manager in the Analytics >> platform team at eBay Inc. >> >> I would like to start a discussion on Eagle and its joining the ASF as an >> incubation project. >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access to >> sensitive data, recognize attacks, malicious activities and take actions in >> real time. Eagle supports a wide variety of policies on HDFS data and Hive. >> Eagle also provides machine learning models for detecting anomalous user >> behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify >> access to sensitive data, recognize attacks, malicious activities in hadoop >> and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >> enforces policies defined on sensitive data access and alerts or blocks >> user’s access to that sensitive data in real time. Eagle also creates user >> profiles based on the typical access behaviour for HDFS and Hive and sends >> alerts when anomalous behaviour is detected. Eagle can also import sensitive >> data information classified by external classification engines to help >> define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop logs >> in real time using Kafka/Yarn API and uses HDFS and HBase for storage. >> 2.Data processing and policy engine - Eagle allows users to create policies >> based on various metadata properties on HDFS, Hive and HBase data. >> 3.Eagle services - Eagle services include policy manager, query service and >> the visualization component. Eagle provides intuitive user interface to >> administer Eagle and an alert dashboard to respond to real time alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any data >> source into Eagle policy evaluation framework. For example, Eagle hdfs audit >> monitoring collects data from Kafka which is populated from namenode log4j >> appender or from logstash agent. Eagle hive monitoring collects hive query >> logs from running job through YARN API, which is designed to be scalable and >> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics >> data, and also supports relational database through configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >> abstraction of Apache Storm. It can also be extended to other streaming >> engines. This abstraction allows developers to assemble data transformation, >> filtering, external data join etc. without physically bound to a specific >> streaming platform. Eagle streaming API allows developers to easily >> integrate business logic with Eagle policy engine and internally Eagle >> framework compiles business logic execution DAG into program primitives of >> underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS >> monitoring transforms audit log from Namenode to object and joins >> sensitivity metadata, security zone metadata which are generated from >> external programs or configured by user. Eagle hive monitoring filters >> running jobs to get hive query string and parses query string into object >> and then joins sensitivity metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >> scalable policy engine framework, extensible policy engine framework. Stream >> metadata API allows developers to declare event schema including what >> attributes constitute an event, what is the type for each attribute, and how >> to dynamically resolve attribute value in runtime when user configures >> policy. Scalable policy engine framework allows policies to be executed on >> different physical nodes in parallel. It is also used to define your own >> policy partitioner class. Policy engine framework together with streaming >> partitioning capability provided by all streaming platforms will make sure >> policies and events can be evaluated in a fully distributed way. Extensible >> policy engine framework allows developer to plugin a new policy engine with >> a few lines of codes. WSO2 Siddhi CEP engine is the policy
Re: [DISCUSS] Eagle incubator proposal
On Thu, Oct 22, 2015 at 12:09 AM, Owen O'Malleywrote: > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning > wrote: > > > I would suggest that Owen O'Malley has not had enough time to be a viable > > mentor recently and should not be on the list of mentors. > > > > I have been helping Kylin out and it is graduating, so I'm down to just > Hawq. I'd like to help Eagle out. > No need to be on the official list of mentors ... just help out anyways ... heck, then you don't have to be responsible for signing off reports ;-)
Re: [DISCUSS] Eagle incubator proposal
HI Marvin, You are preaching to the choir with me =) Totally agree with your comment. The Eagle team have met with proposed mentors before and have presented and ask for availability and willingness to help mentor the project and we got positive response from them. Now, if the question is whether these mentors will or will not active in reality it would hard for me to answer. I know Julian and Taylor have been active and helpful in the Kylin incubating life. To be fair, I will circle one more time to existing mentors for Eagle to confirm their commitment for active participation in the podling. Would that be acceptable solution? - Henry On Wed, Oct 21, 2015 at 9:24 AM, Marvin Humphreywrote: > On Tue, Oct 20, 2015 at 9:16 PM, Henry Saputra > wrote: >> Hi Ted, >> >> Thanks for your concern, but we have had discussions with all proposed >> mentors before to ask for their availability and willingness to >> actively mentor this project. >> >> I think we are good with existing proposed mentors. > > Henry, > > 4 of the 5 proposed Mentors for Eagle are also Mentors for Kylin. Kylin's 1.1 > release candidate is still twisting in the wind awaiting IPMC votes. It was > was offered up on dev@kylin on October 10th -- 11 days ago. It got one Mentor > vote immediately, and another after 9 days. It is still waiting for a third > IPMC vote. > > I think the IPMC needs to take into consideration whether Eagle will have > enough active Mentors when voting on this proposal, since at least some of > the proposed Mentors seem to be having difficulty with their current load. > > Mentors who do not actually participate should not be Mentors. > > Marvin Humphrey > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Eagle incubator proposal
Hi Arun, This proposal looks great. I would like to be an active contributor on this project. I bring with me the experience of Apache Ambari and developing the Ambari Metrics System. Best Regards, Sid From: Julian Hyde <jh...@apache.org> Sent: Wednesday, October 21, 2015 9:10 AM To: general@incubator.apache.org Subject: Re: [DISCUSS] Eagle incubator proposal My name is already on the list of mentors. I think this project fills an important need. Several of the initial committers were involved with Kylin and therefore know the Apache process. Julian > On Oct 20, 2015, at 11:58 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote: > > I should also have some improved bandwidth both now that Kylin is nearing > graduation and for other reasons. I’ve been bogged down recently, but that’s > starting to change. > > If more mentors are desired, I’d be willing to help in that respect. > > -Taylor > >> On Oct 20, 2015, at 11:49 AM, Henry Saputra <henry.sapu...@gmail.com> wrote: >> >> Hi Ted, >> >> Since Kylin almost ready to graduate, I have more bandwidth to help with >> Eagle. >> >> But, you are right that current proposed mentors for Eagle seemed to >> be very busy with other podlings, so 1 or 2 additional mentors would >> be great. >> >> The good news is that the team consist some people from Kylin, for >> example Luke, which done great job helping Kylin to understand working >> with Apache way. >> So we have some help from initial committers who have done the rodeo before. >> >> - Henry >> >> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: >>> I would suggest that Owen O'Malley has not had enough time to be a viable >>> mentor recently and should not be on the list of mentors. >>> >>> Henry and Julian are good if their schedules permit. Henry, I know has >>> been mentoring a number of projects lately. >>> >>> >>> >>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi Arun, >>>> >>>> very interesting proposal. I may see some possible interaction with >>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a >>>> kind of Change Data Capture), etc. >>>> >>>> So, I see a different perspective in Eagle, but Eagle could also leverage >>>> Falcon somehow. >>>> >>>> Regards >>>> JB >>>> >>>> >>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote: >>>> >>>>> Hello Everyone, >>>>> >>>>> My name is Arun Manoharan. Currently a product manager in the Analytics >>>>> platform team at eBay Inc. >>>>> >>>>> I would like to start a discussion on Eagle and its joining the ASF as an >>>>> incubation project. >>>>> >>>>> Eagle is a Monitoring solution for Hadoop to instantly identify access to >>>>> sensitive data, recognize attacks, malicious activities and take actions >>>>> in >>>>> real time. Eagle supports a wide variety of policies on HDFS data and >>>>> Hive. >>>>> Eagle also provides machine learning models for detecting anomalous user >>>>> behavior in Hadoop. >>>>> >>>>> The proposal is available on the wiki here: >>>>> https://wiki.apache.org/incubator/EagleProposal >>>>> >>>>> The text of the proposal is also available at the end of this email. >>>>> >>>>> Thanks for your time and help. >>>>> >>>>> Thanks, >>>>> Arun >>>>> >>>>> >>>>> >>>>> Eagle >>>>> >>>>> Abstract >>>>> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>>>> identify access to sensitive data, recognize attacks, malicious activities >>>>> in hadoop and take actions. >>>>> >>>>> Proposal >>>>> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>>>> enforces policies defined on sensitive data access and alerts or blocks >>>>> user’s access to that sensitive data in real time. Eagle also creates user >>>>> profiles based on the typical access behaviour for HDFS and Hive and sends >>>>> alerts when anomalous behaviour
Re: [DISCUSS] Eagle incubator proposal
Hi Sid, Thanks for your support. Actually we have developed an Ambari plugin for Eagle where someone could use Ambari to deploy Eagle. We have this working on the sandbox. Would like to have you as a contributor. I will reach out to you. Thanks, Arun On 10/21/15, 9:22 AM, "Siddharth Wagle" <swa...@hortonworks.com> wrote: >Hi Arun, > >This proposal looks great. I would like to be an active contributor on >this project. I bring with me the experience of Apache Ambari and >developing the Ambari Metrics System. > >Best Regards, >Sid > >From: Julian Hyde <jh...@apache.org> >Sent: Wednesday, October 21, 2015 9:10 AM >To: general@incubator.apache.org >Subject: Re: [DISCUSS] Eagle incubator proposal > >My name is already on the list of mentors. I think this project fills an >important need. Several of the initial committers were involved with >Kylin and therefore know the Apache process. > >Julian > > >> On Oct 20, 2015, at 11:58 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote: >> >> I should also have some improved bandwidth both now that Kylin is >>nearing graduation and for other reasons. I¹ve been bogged down >>recently, but that¹s starting to change. >> >> If more mentors are desired, I¹d be willing to help in that respect. >> >> -Taylor >> >>> On Oct 20, 2015, at 11:49 AM, Henry Saputra <henry.sapu...@gmail.com> >>>wrote: >>> >>> Hi Ted, >>> >>> Since Kylin almost ready to graduate, I have more bandwidth to help >>>with Eagle. >>> >>> But, you are right that current proposed mentors for Eagle seemed to >>> be very busy with other podlings, so 1 or 2 additional mentors would >>> be great. >>> >>> The good news is that the team consist some people from Kylin, for >>> example Luke, which done great job helping Kylin to understand working >>> with Apache way. >>> So we have some help from initial committers who have done the rodeo >>>before. >>> >>> - Henry >>> >>> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning <ted.dunn...@gmail.com> >>>wrote: >>>> I would suggest that Owen O'Malley has not had enough time to be a >>>>viable >>>> mentor recently and should not be on the list of mentors. >>>> >>>> Henry and Julian are good if their schedules permit. Henry, I know >>>>has >>>> been mentoring a number of projects lately. >>>> >>>> >>>> >>>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré >>>><j...@nanthrax.net> >>>> wrote: >>>> >>>>> Hi Arun, >>>>> >>>>> very interesting proposal. I may see some possible interaction with >>>>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring >>>>>(with a >>>>> kind of Change Data Capture), etc. >>>>> >>>>> So, I see a different perspective in Eagle, but Eagle could also >>>>>leverage >>>>> Falcon somehow. >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> >>>>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote: >>>>> >>>>>> Hello Everyone, >>>>>> >>>>>> My name is Arun Manoharan. Currently a product manager in the >>>>>>Analytics >>>>>> platform team at eBay Inc. >>>>>> >>>>>> I would like to start a discussion on Eagle and its joining the ASF >>>>>>as an >>>>>> incubation project. >>>>>> >>>>>> Eagle is a Monitoring solution for Hadoop to instantly identify >>>>>>access to >>>>>> sensitive data, recognize attacks, malicious activities and take >>>>>>actions in >>>>>> real time. Eagle supports a wide variety of policies on HDFS data >>>>>>and Hive. >>>>>> Eagle also provides machine learning models for detecting anomalous >>>>>>user >>>>>> behavior in Hadoop. >>>>>> >>>>>> The proposal is available on the wiki here: >>>>>> https://wiki.apache.org/incubator/EagleProposal >>>>>> >>>>>> The text of the proposal is also available at the end of this email. >>>>>> >>>>>> Thanks for your time and
Re: [DISCUSS] Eagle incubator proposal
My name is already on the list of mentors. I think this project fills an important need. Several of the initial committers were involved with Kylin and therefore know the Apache process. Julian > On Oct 20, 2015, at 11:58 AM, P. Taylor Goetzwrote: > > I should also have some improved bandwidth both now that Kylin is nearing > graduation and for other reasons. I’ve been bogged down recently, but that’s > starting to change. > > If more mentors are desired, I’d be willing to help in that respect. > > -Taylor > >> On Oct 20, 2015, at 11:49 AM, Henry Saputra wrote: >> >> Hi Ted, >> >> Since Kylin almost ready to graduate, I have more bandwidth to help with >> Eagle. >> >> But, you are right that current proposed mentors for Eagle seemed to >> be very busy with other podlings, so 1 or 2 additional mentors would >> be great. >> >> The good news is that the team consist some people from Kylin, for >> example Luke, which done great job helping Kylin to understand working >> with Apache way. >> So we have some help from initial committers who have done the rodeo before. >> >> - Henry >> >> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning wrote: >>> I would suggest that Owen O'Malley has not had enough time to be a viable >>> mentor recently and should not be on the list of mentors. >>> >>> Henry and Julian are good if their schedules permit. Henry, I know has >>> been mentoring a number of projects lately. >>> >>> >>> >>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré >>> wrote: >>> Hi Arun, very interesting proposal. I may see some possible interaction with Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a kind of Change Data Capture), etc. So, I see a different perspective in Eagle, but Eagle could also leverage Falcon somehow. Regards JB On 10/19/2015 05:33 PM, Manoharan, Arun wrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > sensitive data, recognize attacks, malicious activities and take actions > in > real time. Eagle supports a wide variety of policies on HDFS data and > Hive. > Eagle also provides machine learning models for detecting anomalous user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly > identify access to sensitive data, recognize attacks, malicious activities > in hadoop and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real time, > enforces policies defined on sensitive data access and alerts or blocks > user’s access to that sensitive data in real time. Eagle also creates user > profiles based on the typical access behaviour for HDFS and Hive and sends > alerts when anomalous behaviour is detected. Eagle can also import > sensitive data information classified by external classification engines > to > help define its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various hadoop > logs in real time using Kafka/Yarn API and uses HDFS and HBase for > storage. > 2.Data processing and policy engine - Eagle allows users to create > policies based on various metadata properties on HDFS, Hive and HBase > data. > 3.Eagle services - Eagle services include policy manager, query service > and the visualization component. Eagle provides intuitive user interface > to > administer Eagle and an alert dashboard to respond to real time alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any data > source into Eagle policy evaluation framework. For example, Eagle hdfs > audit monitoring collects data from Kafka which is populated from namenode > log4j appender or from logstash agent. Eagle hive monitoring collects hive > query logs from running job through YARN API, which is designed to be > scalable and fault-tolerant. Eagle uses HBase as storage for storing > metadata and metrics data, and also supports relational database through > configuration change. > > Data Processing and Policy
Re: [DISCUSS] Eagle incubator proposal
On Tue, Oct 20, 2015 at 9:16 PM, Henry Saputrawrote: > Hi Ted, > > Thanks for your concern, but we have had discussions with all proposed > mentors before to ask for their availability and willingness to > actively mentor this project. > > I think we are good with existing proposed mentors. Henry, 4 of the 5 proposed Mentors for Eagle are also Mentors for Kylin. Kylin's 1.1 release candidate is still twisting in the wind awaiting IPMC votes. It was was offered up on dev@kylin on October 10th -- 11 days ago. It got one Mentor vote immediately, and another after 9 days. It is still waiting for a third IPMC vote. I think the IPMC needs to take into consideration whether Eagle will have enough active Mentors when voting on this proposal, since at least some of the proposed Mentors seem to be having difficulty with their current load. Mentors who do not actually participate should not be Mentors. Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Eagle incubator proposal
Thanks Don. Looking forward to work with the Ranger team for meaningful integrations. On 10/20/15, 10:19 PM, "Don Bosco Durai"wrote: >Hi Arun > >This looks really good and fills some obvious gaps in the security >landscape. > >Happy to contribute anyway you want. > >All the best!!! > >Bosco > > > > > >On 10/20/15, 8:02 AM, "Alex Karasulu" akaras...@apache.org> wrote: > >>Hi Arun, >> >>Eagle sounds very promising. I just had a discussion with someone about >>this exact need. I do however agree with Greg on the name. As far as I >>can >>see, besides the name, your weakest point is the all eBay employed team. >>It's not a blocker and can be fixed during incubation. Good luck to you. >> >>Alex >> >> >>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun >>wrote: >> >>> Hi Greg, >>> >>> Thank you for reviewing the proposal. >>> >>> Originally we thought Eagle might be trademarked by someone already >>>but I >>> went thru eBay legal team to get the clearance for the name to be >>>used. We >>> will look into it again to see if there will be potential problems. >>> >>> Thanks, >>> Arun >>> >>> On 10/20/15, 1:52 AM, "Greg Stein" wrote: >>> >>> >Hey there, Arun! ... I have no commentary on the proposal itself, as >>>it >>> >looks like a great proposal. I would suggest being a bit wary of the >>>name, >>> >as "Eagle" is a *very* popular PCB design program. >>> > >>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun >>> >>> >wrote: >>> > >>> >> Hello Everyone, >>> >> >>> >> My name is Arun Manoharan. Currently a product manager in the >>>Analytics >>> >> platform team at eBay Inc. >>> >> >>> >> I would like to start a discussion on Eagle and its joining the ASF >>>as >>> >>an >>> >> incubation project. >>> >> >>> >> Eagle is a Monitoring solution for Hadoop to instantly identify >>>access >>> >>to >>> >> sensitive data, recognize attacks, malicious activities and take >>> >>actions in >>> >> real time. Eagle supports a wide variety of policies on HDFS data >>>and >>> >>Hive. >>> >> Eagle also provides machine learning models for detecting anomalous >>>user >>> >> behavior in Hadoop. >>> >> >>> >> The proposal is available on the wiki here: >>> >> https://wiki.apache.org/incubator/EagleProposal >>> >> >>> >> The text of the proposal is also available at the end of this email. >>> >> >>> >> Thanks for your time and help. >>> >> >>> >> Thanks, >>> >> Arun >>> >> >>> >> >>> >> >>> >> Eagle >>> >> >>> >> Abstract >>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>> >> identify access to sensitive data, recognize attacks, malicious >>> >>activities >>> >> in hadoop and take actions. >>> >> >>> >> Proposal >>> >> Eagle audits access to HDFS files, Hive and HBase tables in real >>>time, >>> >> enforces policies defined on sensitive data access and alerts or >>>blocks >>> >> user¹s access to that sensitive data in real time. Eagle also >>>creates >>> >>user >>> >> profiles based on the typical access behaviour for HDFS and Hive and >>> >>sends >>> >> alerts when anomalous behaviour is detected. Eagle can also import >>> >> sensitive data information classified by external classification >>> >>engines to >>> >> help define its policies. >>> >> >>> >> Overview of Eagle >>> >> Eagle has 3 main parts. >>> >> 1.Data collection and storage - Eagle collects data from various >>>hadoop >>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>> >>storage. >>> >> 2.Data processing and policy engine - Eagle allows users to create >>> >> policies based on various metadata properties on HDFS, Hive and >>>HBase >>> >>data. >>> >> 3.Eagle services - Eagle services include policy manager, query >>>service >>> >> and the visualization component. Eagle provides intuitive user >>> >>interface to >>> >> administer Eagle and an alert dashboard to respond to real time >>>alerts. >>> >> >>> >> Data Collection and Storage: >>> >> Eagle provides programming API for extending Eagle to integrate any >>>data >>> >> source into Eagle policy evaluation framework. For example, Eagle >>>hdfs >>> >> audit monitoring collects data from Kafka which is populated from >>> >>namenode >>> >> log4j appender or from logstash agent. Eagle hive monitoring >>>collects >>> >>hive >>> >> query logs from running job through YARN API, which is designed to >>>be >>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing >>> >> metadata and metrics data, and also supports relational database >>>through >>> >> configuration change. >>> >> >>> >> Data Processing and Policy Engine: >>> >> Processing Engine: Eagle provides stream processing API which is an >>> >> abstraction of Apache Storm. It can also be extended to other >>>streaming >>> >> engines. This abstraction allows developers to assemble data >>> >> transformation, filtering, external data join etc. without >>>physically >>> >>bound >>> >>
Re: [DISCUSS] Eagle incubator proposal
On Wed, Oct 21, 2015 at 11:53 AM, Henry Saputrawrote: >... > To be fair, I will circle one more time to existing mentors for Eagle > to confirm their commitment for active participation in the podling. > Would that be acceptable solution? > Acceptable to whom? I bet you there are enough people who find the list of mentors to be acceptable, despite Marvin's emails otherwise... :-) To put it another way: it is unfair to pre-judge people. Even more, *accepting* a podling should not be subject to a preapproved list of mentors, since we have juggled mentors many times in the past. *Should* a mentor be absent in the *future*, then they can be replaced. ... but I believe it is better to be reactive, than to pre-judge. Cheers, -g
Re: [DISCUSS] Eagle incubator proposal
On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunningwrote: > I would suggest that Owen O'Malley has not had enough time to be a viable > mentor recently and should not be on the list of mentors. > I have been helping Kylin out and it is graduating, so I'm down to just Hawq. I'd like to help Eagle out. .. Owen > Henry and Julian are good if their schedules permit. Henry, I know has > been mentoring a number of projects lately. > > > > On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré > wrote: > > > Hi Arun, > > > > very interesting proposal. I may see some possible interaction with > > Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a > > kind of Change Data Capture), etc. > > > > So, I see a different perspective in Eagle, but Eagle could also leverage > > Falcon somehow. > > > > Regards > > JB > > > > > > On 10/19/2015 05:33 PM, Manoharan, Arun wrote: > > > >> Hello Everyone, > >> > >> My name is Arun Manoharan. Currently a product manager in the Analytics > >> platform team at eBay Inc. > >> > >> I would like to start a discussion on Eagle and its joining the ASF as > an > >> incubation project. > >> > >> Eagle is a Monitoring solution for Hadoop to instantly identify access > to > >> sensitive data, recognize attacks, malicious activities and take > actions in > >> real time. Eagle supports a wide variety of policies on HDFS data and > Hive. > >> Eagle also provides machine learning models for detecting anomalous user > >> behavior in Hadoop. > >> > >> The proposal is available on the wiki here: > >> https://wiki.apache.org/incubator/EagleProposal > >> > >> The text of the proposal is also available at the end of this email. > >> > >> Thanks for your time and help. > >> > >> Thanks, > >> Arun > >> > >> > >> > >> Eagle > >> > >> Abstract > >> Eagle is an Open Source Monitoring solution for Hadoop to instantly > >> identify access to sensitive data, recognize attacks, malicious > activities > >> in hadoop and take actions. > >> > >> Proposal > >> Eagle audits access to HDFS files, Hive and HBase tables in real time, > >> enforces policies defined on sensitive data access and alerts or blocks > >> user’s access to that sensitive data in real time. Eagle also creates > user > >> profiles based on the typical access behaviour for HDFS and Hive and > sends > >> alerts when anomalous behaviour is detected. Eagle can also import > >> sensitive data information classified by external classification > engines to > >> help define its policies. > >> > >> Overview of Eagle > >> Eagle has 3 main parts. > >> 1.Data collection and storage - Eagle collects data from various hadoop > >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for > storage. > >> 2.Data processing and policy engine - Eagle allows users to create > >> policies based on various metadata properties on HDFS, Hive and HBase > data. > >> 3.Eagle services - Eagle services include policy manager, query service > >> and the visualization component. Eagle provides intuitive user > interface to > >> administer Eagle and an alert dashboard to respond to real time alerts. > >> > >> Data Collection and Storage: > >> Eagle provides programming API for extending Eagle to integrate any data > >> source into Eagle policy evaluation framework. For example, Eagle hdfs > >> audit monitoring collects data from Kafka which is populated from > namenode > >> log4j appender or from logstash agent. Eagle hive monitoring collects > hive > >> query logs from running job through YARN API, which is designed to be > >> scalable and fault-tolerant. Eagle uses HBase as storage for storing > >> metadata and metrics data, and also supports relational database through > >> configuration change. > >> > >> Data Processing and Policy Engine: > >> Processing Engine: Eagle provides stream processing API which is an > >> abstraction of Apache Storm. It can also be extended to other streaming > >> engines. This abstraction allows developers to assemble data > >> transformation, filtering, external data join etc. without physically > bound > >> to a specific streaming platform. Eagle streaming API allows developers > to > >> easily integrate business logic with Eagle policy engine and internally > >> Eagle framework compiles business logic execution DAG into program > >> primitives of underlying stream infrastructure e.g. Apache Storm. For > >> example, Eagle HDFS monitoring transforms audit log from Namenode to > object > >> and joins sensitivity metadata, security zone metadata which are > generated > >> from external programs or configured by user. Eagle hive monitoring > filters > >> running jobs to get hive query string and parses query string into > object > >> and then joins sensitivity metadata. > >> Alerting Framework: Eagle Alert Framework includes stream metadata API, > >> scalable policy engine framework, extensible policy engine framework. > >> Stream metadata API allows developers to declare
Re: [DISCUSS] Eagle incubator proposal
I think the concern was just kind of a spill-over from other discussions about rebooting the Incubator so new proposal kind of being scrutinized more. Which is OK for me, that is why we have DISCUSS thread right now :) To just follow up, the concerns that have been raised, the Eagle team is happy with existing mentors and would like to move forward with them. I have strong faith in my fellow IPMCs as mentors for this project to help the project go through the incubation process. - Henry On Wed, Oct 21, 2015 at 12:02 PM, Greg Steinwrote: > On Wed, Oct 21, 2015 at 11:53 AM, Henry Saputra > wrote: >>... > >> To be fair, I will circle one more time to existing mentors for Eagle >> to confirm their commitment for active participation in the podling. >> Would that be acceptable solution? >> > > Acceptable to whom? I bet you there are enough people who find the list of > mentors to be acceptable, despite Marvin's emails otherwise... :-) > > To put it another way: it is unfair to pre-judge people. Even more, > *accepting* a podling should not be subject to a preapproved list of > mentors, since we have juggled mentors many times in the past. *Should* a > mentor be absent in the *future*, then they can be replaced. ... but I > believe it is better to be reactive, than to pre-judge. > > Cheers, > -g - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Eagle incubator proposal
Eagle in realtime evaluates security policies against event stream in a fully distributed way, so low latency and event partition are the two important factors for identifying malicious access instantly. So onboarding data through Falcon should consider these. Thanks Edward Zhang On 10/19/15, 22:46, "Jean-Baptiste Onofré"wrote: >It makes sense. I will try to contribute on this ;) > >Regards >JB > >On 10/19/2015 09:46 PM, Zhang, Edward (GDI Hadoop) wrote: >> Hi JB, >> >> That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase >>data >> changes, so this feature would complement Eagle which today mainly >>focuses >> on HDFS/Hive/HBase data access including view, change, delete etc. Eagle >> would benefit if Eagle can instantly capture data change from Falcon. >> >> Thanks >> Edward Zhang >> >> >> >> On 10/19/15, 8:40, "Jean-Baptiste Onofré" wrote: >> >>> Hi Arun, >>> >>> very interesting proposal. I may see some possible interaction with >>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with >>> a kind of Change Data Capture), etc. >>> >>> So, I see a different perspective in Eagle, but Eagle could also >>> leverage Falcon somehow. >>> >>> Regards >>> JB >>> >>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote: Hello Everyone, My name is Arun Manoharan. Currently a product manager in the Analytics platform team at eBay Inc. I would like to start a discussion on Eagle and its joining the ASF as an incubation project. Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Eagle supports a wide variety of policies on HDFS data and Hive. Eagle also provides machine learning models for detecting anomalous user behavior in Hadoop. The proposal is available on the wiki here: https://wiki.apache.org/incubator/EagleProposal The text of the proposal is also available at the end of this email. Thanks for your time and help. Thanks, Arun Eagle Abstract Eagle is an Open Source Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities in hadoop and take actions. Proposal Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks user¹s access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behaviour for HDFS and Hive and sends alerts when anomalous behaviour is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Overview of Eagle Eagle has 3 main parts. 1.Data collection and storage - Eagle collects data from various hadoop logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. 2.Data processing and policy engine - Eagle allows users to create policies based on various metadata properties on HDFS, Hive and HBase data. 3.Eagle services - Eagle services include policy manager, query service and the visualization component. Eagle provides intuitive user interface to administer Eagle and an alert dashboard to respond to real time alerts. Data Collection and Storage: Eagle provides programming API for extending Eagle to integrate any data source into Eagle policy evaluation framework. For example, Eagle hdfs audit monitoring collects data from Kafka which is populated from namenode log4j appender or from logstash agent. Eagle hive monitoring collects hive query logs from running job through YARN API, which is designed to be scalable and fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics data, and also supports relational database through configuration change. Data Processing and Policy Engine: Processing Engine: Eagle provides stream processing API which is an abstraction of Apache Storm. It can also be extended to other streaming engines. This abstraction allows developers to assemble data transformation, filtering, external data join etc. without physically bound to a specific streaming platform. Eagle streaming API allows developers to easily integrate business logic with Eagle policy engine and internally Eagle framework compiles business logic execution DAG into program primitives of underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from Namenode to object and joins sensitivity metadata, security zone metadata which are generated from external
Re: [DISCUSS] Eagle incubator proposal
Hey there, Arun! ... I have no commentary on the proposal itself, as it looks like a great proposal. I would suggest being a bit wary of the name, as "Eagle" is a *very* popular PCB design program. On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arunwrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > sensitive data, recognize attacks, malicious activities and take actions in > real time. Eagle supports a wide variety of policies on HDFS data and Hive. > Eagle also provides machine learning models for detecting anomalous user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly > identify access to sensitive data, recognize attacks, malicious activities > in hadoop and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real time, > enforces policies defined on sensitive data access and alerts or blocks > user’s access to that sensitive data in real time. Eagle also creates user > profiles based on the typical access behaviour for HDFS and Hive and sends > alerts when anomalous behaviour is detected. Eagle can also import > sensitive data information classified by external classification engines to > help define its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various hadoop > logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. > 2.Data processing and policy engine - Eagle allows users to create > policies based on various metadata properties on HDFS, Hive and HBase data. > 3.Eagle services - Eagle services include policy manager, query service > and the visualization component. Eagle provides intuitive user interface to > administer Eagle and an alert dashboard to respond to real time alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any data > source into Eagle policy evaluation framework. For example, Eagle hdfs > audit monitoring collects data from Kafka which is populated from namenode > log4j appender or from logstash agent. Eagle hive monitoring collects hive > query logs from running job through YARN API, which is designed to be > scalable and fault-tolerant. Eagle uses HBase as storage for storing > metadata and metrics data, and also supports relational database through > configuration change. > > Data Processing and Policy Engine: > Processing Engine: Eagle provides stream processing API which is an > abstraction of Apache Storm. It can also be extended to other streaming > engines. This abstraction allows developers to assemble data > transformation, filtering, external data join etc. without physically bound > to a specific streaming platform. Eagle streaming API allows developers to > easily integrate business logic with Eagle policy engine and internally > Eagle framework compiles business logic execution DAG into program > primitives of underlying stream infrastructure e.g. Apache Storm. For > example, Eagle HDFS monitoring transforms audit log from Namenode to object > and joins sensitivity metadata, security zone metadata which are generated > from external programs or configured by user. Eagle hive monitoring filters > running jobs to get hive query string and parses query string into object > and then joins sensitivity metadata. > Alerting Framework: Eagle Alert Framework includes stream metadata API, > scalable policy engine framework, extensible policy engine framework. > Stream metadata API allows developers to declare event schema including > what attributes constitute an event, what is the type for each attribute, > and how to dynamically resolve attribute value in runtime when user > configures policy. Scalable policy engine framework allows policies to be > executed on different physical nodes in parallel. It is also used to define > your own policy partitioner class. Policy engine framework together with > streaming partitioning capability provided by all streaming platforms will > make sure policies and events can be evaluated in a fully distributed way. > Extensible policy engine framework allows developer to plugin a new policy > engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy > engine which Eagle supports as first-class citizen. > Machine Learning module: Eagle provides capabilities to define user > activity patterns or user profiles for Hadoop users based on the user > behaviour in the platform. These
Re: [DISCUSS] Eagle incubator proposal
I would like to volunteer as mentor and help the project, if you are looking for more mentors. Thanks Amareshwari On Mon, Oct 19, 2015 at 9:03 PM, Manoharan, Arunwrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > sensitive data, recognize attacks, malicious activities and take actions in > real time. Eagle supports a wide variety of policies on HDFS data and Hive. > Eagle also provides machine learning models for detecting anomalous user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly > identify access to sensitive data, recognize attacks, malicious activities > in hadoop and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real time, > enforces policies defined on sensitive data access and alerts or blocks > user’s access to that sensitive data in real time. Eagle also creates user > profiles based on the typical access behaviour for HDFS and Hive and sends > alerts when anomalous behaviour is detected. Eagle can also import > sensitive data information classified by external classification engines to > help define its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various hadoop > logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. > 2.Data processing and policy engine - Eagle allows users to create > policies based on various metadata properties on HDFS, Hive and HBase data. > 3.Eagle services - Eagle services include policy manager, query service > and the visualization component. Eagle provides intuitive user interface to > administer Eagle and an alert dashboard to respond to real time alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any data > source into Eagle policy evaluation framework. For example, Eagle hdfs > audit monitoring collects data from Kafka which is populated from namenode > log4j appender or from logstash agent. Eagle hive monitoring collects hive > query logs from running job through YARN API, which is designed to be > scalable and fault-tolerant. Eagle uses HBase as storage for storing > metadata and metrics data, and also supports relational database through > configuration change. > > Data Processing and Policy Engine: > Processing Engine: Eagle provides stream processing API which is an > abstraction of Apache Storm. It can also be extended to other streaming > engines. This abstraction allows developers to assemble data > transformation, filtering, external data join etc. without physically bound > to a specific streaming platform. Eagle streaming API allows developers to > easily integrate business logic with Eagle policy engine and internally > Eagle framework compiles business logic execution DAG into program > primitives of underlying stream infrastructure e.g. Apache Storm. For > example, Eagle HDFS monitoring transforms audit log from Namenode to object > and joins sensitivity metadata, security zone metadata which are generated > from external programs or configured by user. Eagle hive monitoring filters > running jobs to get hive query string and parses query string into object > and then joins sensitivity metadata. > Alerting Framework: Eagle Alert Framework includes stream metadata API, > scalable policy engine framework, extensible policy engine framework. > Stream metadata API allows developers to declare event schema including > what attributes constitute an event, what is the type for each attribute, > and how to dynamically resolve attribute value in runtime when user > configures policy. Scalable policy engine framework allows policies to be > executed on different physical nodes in parallel. It is also used to define > your own policy partitioner class. Policy engine framework together with > streaming partitioning capability provided by all streaming platforms will > make sure policies and events can be evaluated in a fully distributed way. > Extensible policy engine framework allows developer to plugin a new policy > engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy > engine which Eagle supports as first-class citizen. > Machine Learning module: Eagle provides capabilities to define user > activity patterns or user profiles for Hadoop users based on the user > behaviour in the platform. These user profiles are modeled using Machine > Learning algorithms and used for detection
Re: [DISCUSS] Eagle incubator proposal
So glad to see one more project coming from eBay:-) Best Regards! - Luke Han On Tue, Oct 20, 2015 at 4:52 PM, Greg Steinwrote: > Hey there, Arun! ... I have no commentary on the proposal itself, as it > looks like a great proposal. I would suggest being a bit wary of the name, > as "Eagle" is a *very* popular PCB design program. > > On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun > wrote: > > > Hello Everyone, > > > > My name is Arun Manoharan. Currently a product manager in the Analytics > > platform team at eBay Inc. > > > > I would like to start a discussion on Eagle and its joining the ASF as an > > incubation project. > > > > Eagle is a Monitoring solution for Hadoop to instantly identify access to > > sensitive data, recognize attacks, malicious activities and take actions > in > > real time. Eagle supports a wide variety of policies on HDFS data and > Hive. > > Eagle also provides machine learning models for detecting anomalous user > > behavior in Hadoop. > > > > The proposal is available on the wiki here: > > https://wiki.apache.org/incubator/EagleProposal > > > > The text of the proposal is also available at the end of this email. > > > > Thanks for your time and help. > > > > Thanks, > > Arun > > > > > > > > Eagle > > > > Abstract > > Eagle is an Open Source Monitoring solution for Hadoop to instantly > > identify access to sensitive data, recognize attacks, malicious > activities > > in hadoop and take actions. > > > > Proposal > > Eagle audits access to HDFS files, Hive and HBase tables in real time, > > enforces policies defined on sensitive data access and alerts or blocks > > user’s access to that sensitive data in real time. Eagle also creates > user > > profiles based on the typical access behaviour for HDFS and Hive and > sends > > alerts when anomalous behaviour is detected. Eagle can also import > > sensitive data information classified by external classification engines > to > > help define its policies. > > > > Overview of Eagle > > Eagle has 3 main parts. > > 1.Data collection and storage - Eagle collects data from various hadoop > > logs in real time using Kafka/Yarn API and uses HDFS and HBase for > storage. > > 2.Data processing and policy engine - Eagle allows users to create > > policies based on various metadata properties on HDFS, Hive and HBase > data. > > 3.Eagle services - Eagle services include policy manager, query service > > and the visualization component. Eagle provides intuitive user interface > to > > administer Eagle and an alert dashboard to respond to real time alerts. > > > > Data Collection and Storage: > > Eagle provides programming API for extending Eagle to integrate any data > > source into Eagle policy evaluation framework. For example, Eagle hdfs > > audit monitoring collects data from Kafka which is populated from > namenode > > log4j appender or from logstash agent. Eagle hive monitoring collects > hive > > query logs from running job through YARN API, which is designed to be > > scalable and fault-tolerant. Eagle uses HBase as storage for storing > > metadata and metrics data, and also supports relational database through > > configuration change. > > > > Data Processing and Policy Engine: > > Processing Engine: Eagle provides stream processing API which is an > > abstraction of Apache Storm. It can also be extended to other streaming > > engines. This abstraction allows developers to assemble data > > transformation, filtering, external data join etc. without physically > bound > > to a specific streaming platform. Eagle streaming API allows developers > to > > easily integrate business logic with Eagle policy engine and internally > > Eagle framework compiles business logic execution DAG into program > > primitives of underlying stream infrastructure e.g. Apache Storm. For > > example, Eagle HDFS monitoring transforms audit log from Namenode to > object > > and joins sensitivity metadata, security zone metadata which are > generated > > from external programs or configured by user. Eagle hive monitoring > filters > > running jobs to get hive query string and parses query string into object > > and then joins sensitivity metadata. > > Alerting Framework: Eagle Alert Framework includes stream metadata API, > > scalable policy engine framework, extensible policy engine framework. > > Stream metadata API allows developers to declare event schema including > > what attributes constitute an event, what is the type for each attribute, > > and how to dynamically resolve attribute value in runtime when user > > configures policy. Scalable policy engine framework allows policies to be > > executed on different physical nodes in parallel. It is also used to > define > > your own policy partitioner class. Policy engine framework together with > > streaming partitioning capability provided by all streaming platforms > will > > make sure policies and events can be evaluated in a fully
Re: [DISCUSS] Eagle incubator proposal
Hi Ted, Since Kylin almost ready to graduate, I have more bandwidth to help with Eagle. But, you are right that current proposed mentors for Eagle seemed to be very busy with other podlings, so 1 or 2 additional mentors would be great. The good news is that the team consist some people from Kylin, for example Luke, which done great job helping Kylin to understand working with Apache way. So we have some help from initial committers who have done the rodeo before. - Henry On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunningwrote: > I would suggest that Owen O'Malley has not had enough time to be a viable > mentor recently and should not be on the list of mentors. > > Henry and Julian are good if their schedules permit. Henry, I know has > been mentoring a number of projects lately. > > > > On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré > wrote: > >> Hi Arun, >> >> very interesting proposal. I may see some possible interaction with >> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a >> kind of Change Data Capture), etc. >> >> So, I see a different perspective in Eagle, but Eagle could also leverage >> Falcon somehow. >> >> Regards >> JB >> >> >> On 10/19/2015 05:33 PM, Manoharan, Arun wrote: >> >>> Hello Everyone, >>> >>> My name is Arun Manoharan. Currently a product manager in the Analytics >>> platform team at eBay Inc. >>> >>> I would like to start a discussion on Eagle and its joining the ASF as an >>> incubation project. >>> >>> Eagle is a Monitoring solution for Hadoop to instantly identify access to >>> sensitive data, recognize attacks, malicious activities and take actions in >>> real time. Eagle supports a wide variety of policies on HDFS data and Hive. >>> Eagle also provides machine learning models for detecting anomalous user >>> behavior in Hadoop. >>> >>> The proposal is available on the wiki here: >>> https://wiki.apache.org/incubator/EagleProposal >>> >>> The text of the proposal is also available at the end of this email. >>> >>> Thanks for your time and help. >>> >>> Thanks, >>> Arun >>> >>> >>> >>> Eagle >>> >>> Abstract >>> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>> identify access to sensitive data, recognize attacks, malicious activities >>> in hadoop and take actions. >>> >>> Proposal >>> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>> enforces policies defined on sensitive data access and alerts or blocks >>> user’s access to that sensitive data in real time. Eagle also creates user >>> profiles based on the typical access behaviour for HDFS and Hive and sends >>> alerts when anomalous behaviour is detected. Eagle can also import >>> sensitive data information classified by external classification engines to >>> help define its policies. >>> >>> Overview of Eagle >>> Eagle has 3 main parts. >>> 1.Data collection and storage - Eagle collects data from various hadoop >>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. >>> 2.Data processing and policy engine - Eagle allows users to create >>> policies based on various metadata properties on HDFS, Hive and HBase data. >>> 3.Eagle services - Eagle services include policy manager, query service >>> and the visualization component. Eagle provides intuitive user interface to >>> administer Eagle and an alert dashboard to respond to real time alerts. >>> >>> Data Collection and Storage: >>> Eagle provides programming API for extending Eagle to integrate any data >>> source into Eagle policy evaluation framework. For example, Eagle hdfs >>> audit monitoring collects data from Kafka which is populated from namenode >>> log4j appender or from logstash agent. Eagle hive monitoring collects hive >>> query logs from running job through YARN API, which is designed to be >>> scalable and fault-tolerant. Eagle uses HBase as storage for storing >>> metadata and metrics data, and also supports relational database through >>> configuration change. >>> >>> Data Processing and Policy Engine: >>> Processing Engine: Eagle provides stream processing API which is an >>> abstraction of Apache Storm. It can also be extended to other streaming >>> engines. This abstraction allows developers to assemble data >>> transformation, filtering, external data join etc. without physically bound >>> to a specific streaming platform. Eagle streaming API allows developers to >>> easily integrate business logic with Eagle policy engine and internally >>> Eagle framework compiles business logic execution DAG into program >>> primitives of underlying stream infrastructure e.g. Apache Storm. For >>> example, Eagle HDFS monitoring transforms audit log from Namenode to object >>> and joins sensitivity metadata, security zone metadata which are generated >>> from external programs or configured by user. Eagle hive monitoring filters >>> running jobs to get hive query string and parses query string into object >>> and then
Re: [DISCUSS] Eagle incubator proposal
On Tue, Oct 20, 2015 at 10:51 AM, Manoharan, Arunwrote: > Hi Greg, > > Thank you for reviewing the proposal. > > Originally we thought Eagle might be trademarked by someone already but I > went thru eBay legal team to get the clearance for the name to be used. We > will look into it again to see if there will be potential problems. Ultimately it will be the ASF that determines the appropriateness of the name for a podling. A few pointers: http://incubator.apache.org/guides/names.html https://issues.apache.org/jira/browse/PODLINGNAMESEARCH/ > Thanks, > Arun - Sam Ruby > On 10/20/15, 1:52 AM, "Greg Stein" wrote: > >>Hey there, Arun! ... I have no commentary on the proposal itself, as it >>looks like a great proposal. I would suggest being a bit wary of the name, >>as "Eagle" is a *very* popular PCB design program. >> >>On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun >>wrote: >> >>> Hello Everyone, >>> >>> My name is Arun Manoharan. Currently a product manager in the Analytics >>> platform team at eBay Inc. >>> >>> I would like to start a discussion on Eagle and its joining the ASF as >>>an >>> incubation project. >>> >>> Eagle is a Monitoring solution for Hadoop to instantly identify access >>>to >>> sensitive data, recognize attacks, malicious activities and take >>>actions in >>> real time. Eagle supports a wide variety of policies on HDFS data and >>>Hive. >>> Eagle also provides machine learning models for detecting anomalous user >>> behavior in Hadoop. >>> >>> The proposal is available on the wiki here: >>> https://wiki.apache.org/incubator/EagleProposal >>> >>> The text of the proposal is also available at the end of this email. >>> >>> Thanks for your time and help. >>> >>> Thanks, >>> Arun >>> >>> >>> >>> Eagle >>> >>> Abstract >>> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>> identify access to sensitive data, recognize attacks, malicious >>>activities >>> in hadoop and take actions. >>> >>> Proposal >>> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>> enforces policies defined on sensitive data access and alerts or blocks >>> user¹s access to that sensitive data in real time. Eagle also creates >>>user >>> profiles based on the typical access behaviour for HDFS and Hive and >>>sends >>> alerts when anomalous behaviour is detected. Eagle can also import >>> sensitive data information classified by external classification >>>engines to >>> help define its policies. >>> >>> Overview of Eagle >>> Eagle has 3 main parts. >>> 1.Data collection and storage - Eagle collects data from various hadoop >>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>>storage. >>> 2.Data processing and policy engine - Eagle allows users to create >>> policies based on various metadata properties on HDFS, Hive and HBase >>>data. >>> 3.Eagle services - Eagle services include policy manager, query service >>> and the visualization component. Eagle provides intuitive user >>>interface to >>> administer Eagle and an alert dashboard to respond to real time alerts. >>> >>> Data Collection and Storage: >>> Eagle provides programming API for extending Eagle to integrate any data >>> source into Eagle policy evaluation framework. For example, Eagle hdfs >>> audit monitoring collects data from Kafka which is populated from >>>namenode >>> log4j appender or from logstash agent. Eagle hive monitoring collects >>>hive >>> query logs from running job through YARN API, which is designed to be >>> scalable and fault-tolerant. Eagle uses HBase as storage for storing >>> metadata and metrics data, and also supports relational database through >>> configuration change. >>> >>> Data Processing and Policy Engine: >>> Processing Engine: Eagle provides stream processing API which is an >>> abstraction of Apache Storm. It can also be extended to other streaming >>> engines. This abstraction allows developers to assemble data >>> transformation, filtering, external data join etc. without physically >>>bound >>> to a specific streaming platform. Eagle streaming API allows developers >>>to >>> easily integrate business logic with Eagle policy engine and internally >>> Eagle framework compiles business logic execution DAG into program >>> primitives of underlying stream infrastructure e.g. Apache Storm. For >>> example, Eagle HDFS monitoring transforms audit log from Namenode to >>>object >>> and joins sensitivity metadata, security zone metadata which are >>>generated >>> from external programs or configured by user. Eagle hive monitoring >>>filters >>> running jobs to get hive query string and parses query string into >>>object >>> and then joins sensitivity metadata. >>> Alerting Framework: Eagle Alert Framework includes stream metadata API, >>> scalable policy engine framework, extensible policy engine framework. >>> Stream metadata API allows developers to declare event schema including >>> what
Re: [DISCUSS] Eagle incubator proposal
I should also have some improved bandwidth both now that Kylin is nearing graduation and for other reasons. I’ve been bogged down recently, but that’s starting to change. If more mentors are desired, I’d be willing to help in that respect. -Taylor > On Oct 20, 2015, at 11:49 AM, Henry Saputrawrote: > > Hi Ted, > > Since Kylin almost ready to graduate, I have more bandwidth to help with > Eagle. > > But, you are right that current proposed mentors for Eagle seemed to > be very busy with other podlings, so 1 or 2 additional mentors would > be great. > > The good news is that the team consist some people from Kylin, for > example Luke, which done great job helping Kylin to understand working > with Apache way. > So we have some help from initial committers who have done the rodeo before. > > - Henry > > On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning wrote: >> I would suggest that Owen O'Malley has not had enough time to be a viable >> mentor recently and should not be on the list of mentors. >> >> Henry and Julian are good if their schedules permit. Henry, I know has >> been mentoring a number of projects lately. >> >> >> >> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré >> wrote: >> >>> Hi Arun, >>> >>> very interesting proposal. I may see some possible interaction with >>> Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a >>> kind of Change Data Capture), etc. >>> >>> So, I see a different perspective in Eagle, but Eagle could also leverage >>> Falcon somehow. >>> >>> Regards >>> JB >>> >>> >>> On 10/19/2015 05:33 PM, Manoharan, Arun wrote: >>> Hello Everyone, My name is Arun Manoharan. Currently a product manager in the Analytics platform team at eBay Inc. I would like to start a discussion on Eagle and its joining the ASF as an incubation project. Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Eagle supports a wide variety of policies on HDFS data and Hive. Eagle also provides machine learning models for detecting anomalous user behavior in Hadoop. The proposal is available on the wiki here: https://wiki.apache.org/incubator/EagleProposal The text of the proposal is also available at the end of this email. Thanks for your time and help. Thanks, Arun Eagle Abstract Eagle is an Open Source Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities in hadoop and take actions. Proposal Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks user’s access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behaviour for HDFS and Hive and sends alerts when anomalous behaviour is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Overview of Eagle Eagle has 3 main parts. 1.Data collection and storage - Eagle collects data from various hadoop logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. 2.Data processing and policy engine - Eagle allows users to create policies based on various metadata properties on HDFS, Hive and HBase data. 3.Eagle services - Eagle services include policy manager, query service and the visualization component. Eagle provides intuitive user interface to administer Eagle and an alert dashboard to respond to real time alerts. Data Collection and Storage: Eagle provides programming API for extending Eagle to integrate any data source into Eagle policy evaluation framework. For example, Eagle hdfs audit monitoring collects data from Kafka which is populated from namenode log4j appender or from logstash agent. Eagle hive monitoring collects hive query logs from running job through YARN API, which is designed to be scalable and fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics data, and also supports relational database through configuration change. Data Processing and Policy Engine: Processing Engine: Eagle provides stream processing API which is an abstraction of Apache Storm. It can also be extended to other streaming engines. This abstraction allows developers to assemble data transformation, filtering, external data join etc. without physically bound to a specific streaming platform. Eagle streaming API allows developers to easily integrate business logic with Eagle policy engine
Re: [DISCUSS] Eagle incubator proposal
Hi Greg, Thank you for reviewing the proposal. Originally we thought Eagle might be trademarked by someone already but I went thru eBay legal team to get the clearance for the name to be used. We will look into it again to see if there will be potential problems. Thanks, Arun On 10/20/15, 1:52 AM, "Greg Stein"wrote: >Hey there, Arun! ... I have no commentary on the proposal itself, as it >looks like a great proposal. I would suggest being a bit wary of the name, >as "Eagle" is a *very* popular PCB design program. > >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun >wrote: > >> Hello Everyone, >> >> My name is Arun Manoharan. Currently a product manager in the Analytics >> platform team at eBay Inc. >> >> I would like to start a discussion on Eagle and its joining the ASF as >>an >> incubation project. >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access >>to >> sensitive data, recognize attacks, malicious activities and take >>actions in >> real time. Eagle supports a wide variety of policies on HDFS data and >>Hive. >> Eagle also provides machine learning models for detecting anomalous user >> behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >> identify access to sensitive data, recognize attacks, malicious >>activities >> in hadoop and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >> enforces policies defined on sensitive data access and alerts or blocks >> user¹s access to that sensitive data in real time. Eagle also creates >>user >> profiles based on the typical access behaviour for HDFS and Hive and >>sends >> alerts when anomalous behaviour is detected. Eagle can also import >> sensitive data information classified by external classification >>engines to >> help define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>storage. >> 2.Data processing and policy engine - Eagle allows users to create >> policies based on various metadata properties on HDFS, Hive and HBase >>data. >> 3.Eagle services - Eagle services include policy manager, query service >> and the visualization component. Eagle provides intuitive user >>interface to >> administer Eagle and an alert dashboard to respond to real time alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any data >> source into Eagle policy evaluation framework. For example, Eagle hdfs >> audit monitoring collects data from Kafka which is populated from >>namenode >> log4j appender or from logstash agent. Eagle hive monitoring collects >>hive >> query logs from running job through YARN API, which is designed to be >> scalable and fault-tolerant. Eagle uses HBase as storage for storing >> metadata and metrics data, and also supports relational database through >> configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >> abstraction of Apache Storm. It can also be extended to other streaming >> engines. This abstraction allows developers to assemble data >> transformation, filtering, external data join etc. without physically >>bound >> to a specific streaming platform. Eagle streaming API allows developers >>to >> easily integrate business logic with Eagle policy engine and internally >> Eagle framework compiles business logic execution DAG into program >> primitives of underlying stream infrastructure e.g. Apache Storm. For >> example, Eagle HDFS monitoring transforms audit log from Namenode to >>object >> and joins sensitivity metadata, security zone metadata which are >>generated >> from external programs or configured by user. Eagle hive monitoring >>filters >> running jobs to get hive query string and parses query string into >>object >> and then joins sensitivity metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >> scalable policy engine framework, extensible policy engine framework. >> Stream metadata API allows developers to declare event schema including >> what attributes constitute an event, what is the type for each >>attribute, >> and how to dynamically resolve attribute value in runtime when user >> configures policy. Scalable policy engine framework allows policies to >>be >> executed on different physical nodes in parallel. It is also used to >>define >> your own policy partitioner class. Policy engine framework together with >> streaming partitioning capability provided by all
Re: [DISCUSS] Eagle incubator proposal
Hi Arun, Eagle sounds very promising. I just had a discussion with someone about this exact need. I do however agree with Greg on the name. As far as I can see, besides the name, your weakest point is the all eBay employed team. It's not a blocker and can be fixed during incubation. Good luck to you. Alex On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arunwrote: > Hi Greg, > > Thank you for reviewing the proposal. > > Originally we thought Eagle might be trademarked by someone already but I > went thru eBay legal team to get the clearance for the name to be used. We > will look into it again to see if there will be potential problems. > > Thanks, > Arun > > On 10/20/15, 1:52 AM, "Greg Stein" wrote: > > >Hey there, Arun! ... I have no commentary on the proposal itself, as it > >looks like a great proposal. I would suggest being a bit wary of the name, > >as "Eagle" is a *very* popular PCB design program. > > > >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun > >wrote: > > > >> Hello Everyone, > >> > >> My name is Arun Manoharan. Currently a product manager in the Analytics > >> platform team at eBay Inc. > >> > >> I would like to start a discussion on Eagle and its joining the ASF as > >>an > >> incubation project. > >> > >> Eagle is a Monitoring solution for Hadoop to instantly identify access > >>to > >> sensitive data, recognize attacks, malicious activities and take > >>actions in > >> real time. Eagle supports a wide variety of policies on HDFS data and > >>Hive. > >> Eagle also provides machine learning models for detecting anomalous user > >> behavior in Hadoop. > >> > >> The proposal is available on the wiki here: > >> https://wiki.apache.org/incubator/EagleProposal > >> > >> The text of the proposal is also available at the end of this email. > >> > >> Thanks for your time and help. > >> > >> Thanks, > >> Arun > >> > >> > >> > >> Eagle > >> > >> Abstract > >> Eagle is an Open Source Monitoring solution for Hadoop to instantly > >> identify access to sensitive data, recognize attacks, malicious > >>activities > >> in hadoop and take actions. > >> > >> Proposal > >> Eagle audits access to HDFS files, Hive and HBase tables in real time, > >> enforces policies defined on sensitive data access and alerts or blocks > >> user¹s access to that sensitive data in real time. Eagle also creates > >>user > >> profiles based on the typical access behaviour for HDFS and Hive and > >>sends > >> alerts when anomalous behaviour is detected. Eagle can also import > >> sensitive data information classified by external classification > >>engines to > >> help define its policies. > >> > >> Overview of Eagle > >> Eagle has 3 main parts. > >> 1.Data collection and storage - Eagle collects data from various hadoop > >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for > >>storage. > >> 2.Data processing and policy engine - Eagle allows users to create > >> policies based on various metadata properties on HDFS, Hive and HBase > >>data. > >> 3.Eagle services - Eagle services include policy manager, query service > >> and the visualization component. Eagle provides intuitive user > >>interface to > >> administer Eagle and an alert dashboard to respond to real time alerts. > >> > >> Data Collection and Storage: > >> Eagle provides programming API for extending Eagle to integrate any data > >> source into Eagle policy evaluation framework. For example, Eagle hdfs > >> audit monitoring collects data from Kafka which is populated from > >>namenode > >> log4j appender or from logstash agent. Eagle hive monitoring collects > >>hive > >> query logs from running job through YARN API, which is designed to be > >> scalable and fault-tolerant. Eagle uses HBase as storage for storing > >> metadata and metrics data, and also supports relational database through > >> configuration change. > >> > >> Data Processing and Policy Engine: > >> Processing Engine: Eagle provides stream processing API which is an > >> abstraction of Apache Storm. It can also be extended to other streaming > >> engines. This abstraction allows developers to assemble data > >> transformation, filtering, external data join etc. without physically > >>bound > >> to a specific streaming platform. Eagle streaming API allows developers > >>to > >> easily integrate business logic with Eagle policy engine and internally > >> Eagle framework compiles business logic execution DAG into program > >> primitives of underlying stream infrastructure e.g. Apache Storm. For > >> example, Eagle HDFS monitoring transforms audit log from Namenode to > >>object > >> and joins sensitivity metadata, security zone metadata which are > >>generated > >> from external programs or configured by user. Eagle hive monitoring > >>filters > >> running jobs to get hive query string and parses query string into > >>object > >> and then joins sensitivity metadata. > >> Alerting Framework: Eagle Alert Framework
Re: [DISCUSS] Eagle incubator proposal
Thanks Taylor. I will add you to the mentor list. On 10/20/15, 11:58 AM, "P. Taylor Goetz"wrote: >I should also have some improved bandwidth both now that Kylin is nearing >graduation and for other reasons. I¹ve been bogged down recently, but >that¹s starting to change. > >If more mentors are desired, I¹d be willing to help in that respect. > >-Taylor > >> On Oct 20, 2015, at 11:49 AM, Henry Saputra >>wrote: >> >> Hi Ted, >> >> Since Kylin almost ready to graduate, I have more bandwidth to help >>with Eagle. >> >> But, you are right that current proposed mentors for Eagle seemed to >> be very busy with other podlings, so 1 or 2 additional mentors would >> be great. >> >> The good news is that the team consist some people from Kylin, for >> example Luke, which done great job helping Kylin to understand working >> with Apache way. >> So we have some help from initial committers who have done the rodeo >>before. >> >> - Henry >> >> On Mon, Oct 19, 2015 at 9:00 AM, Ted Dunning >>wrote: >>> I would suggest that Owen O'Malley has not had enough time to be a >>>viable >>> mentor recently and should not be on the list of mentors. >>> >>> Henry and Julian are good if their schedules permit. Henry, I know has >>> been mentoring a number of projects lately. >>> >>> >>> >>> On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofré >>> wrote: >>> Hi Arun, very interesting proposal. I may see some possible interaction with Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a kind of Change Data Capture), etc. So, I see a different perspective in Eagle, but Eagle could also leverage Falcon somehow. Regards JB On 10/19/2015 05:33 PM, Manoharan, Arun wrote: > Hello Everyone, > > My name is Arun Manoharan. Currently a product manager in the >Analytics > platform team at eBay Inc. > > I would like to start a discussion on Eagle and its joining the ASF >as an > incubation project. > > Eagle is a Monitoring solution for Hadoop to instantly identify >access to > sensitive data, recognize attacks, malicious activities and take >actions in > real time. Eagle supports a wide variety of policies on HDFS data >and Hive. > Eagle also provides machine learning models for detecting anomalous >user > behavior in Hadoop. > > The proposal is available on the wiki here: > https://wiki.apache.org/incubator/EagleProposal > > The text of the proposal is also available at the end of this email. > > Thanks for your time and help. > > Thanks, > Arun > > > > Eagle > > Abstract > Eagle is an Open Source Monitoring solution for Hadoop to instantly > identify access to sensitive data, recognize attacks, malicious >activities > in hadoop and take actions. > > Proposal > Eagle audits access to HDFS files, Hive and HBase tables in real >time, > enforces policies defined on sensitive data access and alerts or >blocks > user¹s access to that sensitive data in real time. Eagle also >creates user > profiles based on the typical access behaviour for HDFS and Hive and >sends > alerts when anomalous behaviour is detected. Eagle can also import > sensitive data information classified by external classification >engines to > help define its policies. > > Overview of Eagle > Eagle has 3 main parts. > 1.Data collection and storage - Eagle collects data from various >hadoop > logs in real time using Kafka/Yarn API and uses HDFS and HBase for >storage. > 2.Data processing and policy engine - Eagle allows users to create > policies based on various metadata properties on HDFS, Hive and >HBase data. > 3.Eagle services - Eagle services include policy manager, query >service > and the visualization component. Eagle provides intuitive user >interface to > administer Eagle and an alert dashboard to respond to real time >alerts. > > Data Collection and Storage: > Eagle provides programming API for extending Eagle to integrate any >data > source into Eagle policy evaluation framework. For example, Eagle >hdfs > audit monitoring collects data from Kafka which is populated from >namenode > log4j appender or from logstash agent. Eagle hive monitoring >collects hive > query logs from running job through YARN API, which is designed to be > scalable and fault-tolerant. Eagle uses HBase as storage for storing > metadata and metrics data, and also supports relational database >through > configuration change. > > Data Processing and Policy Engine: > Processing Engine: Eagle provides stream processing API which is an
Re: [DISCUSS] Eagle incubator proposal
Hi Ted, Thanks for your concern, but we have had discussions with all proposed mentors before to ask for their availability and willingness to actively mentor this project. I think we are good with existing proposed mentors. - Henry On Tue, Oct 20, 2015 at 9:10 PM, Ted Dunningwrote: > On Tue, Oct 20, 2015 at 4:14 PM, Manoharan, Arun > wrote: > >> Thanks Taylor. I will add you to the mentor list. >> > > > Arun, > > Can you also do a scrub of the mentor list by asking each of the mentors > whether they have been able to support other groups that they are > mentoring. If they don't answer, or if they can't say that they have been > supportive (at least to the extent of signing off project reports), then > please remove them from your list. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Eagle incubator proposal
On Tue, Oct 20, 2015 at 4:14 PM, Manoharan, Arunwrote: > Thanks Taylor. I will add you to the mentor list. > Arun, Can you also do a scrub of the mentor list by asking each of the mentors whether they have been able to support other groups that they are mentoring. If they don't answer, or if they can't say that they have been supportive (at least to the extent of signing off project reports), then please remove them from your list.
Re: [DISCUSS] Eagle incubator proposal
Hi Arun This looks really good and fills some obvious gaps in the security landscape. Happy to contribute anyway you want. All the best!!! Bosco On 10/20/15, 8:02 AM, "Alex Karasulu"wrote: >Hi Arun, > >Eagle sounds very promising. I just had a discussion with someone about >this exact need. I do however agree with Greg on the name. As far as I can >see, besides the name, your weakest point is the all eBay employed team. >It's not a blocker and can be fixed during incubation. Good luck to you. > >Alex > > >On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun >wrote: > >> Hi Greg, >> >> Thank you for reviewing the proposal. >> >> Originally we thought Eagle might be trademarked by someone already but I >> went thru eBay legal team to get the clearance for the name to be used. We >> will look into it again to see if there will be potential problems. >> >> Thanks, >> Arun >> >> On 10/20/15, 1:52 AM, "Greg Stein" wrote: >> >> >Hey there, Arun! ... I have no commentary on the proposal itself, as it >> >looks like a great proposal. I would suggest being a bit wary of the name, >> >as "Eagle" is a *very* popular PCB design program. >> > >> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun >> >wrote: >> > >> >> Hello Everyone, >> >> >> >> My name is Arun Manoharan. Currently a product manager in the Analytics >> >> platform team at eBay Inc. >> >> >> >> I would like to start a discussion on Eagle and its joining the ASF as >> >>an >> >> incubation project. >> >> >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access >> >>to >> >> sensitive data, recognize attacks, malicious activities and take >> >>actions in >> >> real time. Eagle supports a wide variety of policies on HDFS data and >> >>Hive. >> >> Eagle also provides machine learning models for detecting anomalous user >> >> behavior in Hadoop. >> >> >> >> The proposal is available on the wiki here: >> >> https://wiki.apache.org/incubator/EagleProposal >> >> >> >> The text of the proposal is also available at the end of this email. >> >> >> >> Thanks for your time and help. >> >> >> >> Thanks, >> >> Arun >> >> >> >> >> >> >> >> Eagle >> >> >> >> Abstract >> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >> >> identify access to sensitive data, recognize attacks, malicious >> >>activities >> >> in hadoop and take actions. >> >> >> >> Proposal >> >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >> >> enforces policies defined on sensitive data access and alerts or blocks >> >> user¹s access to that sensitive data in real time. Eagle also creates >> >>user >> >> profiles based on the typical access behaviour for HDFS and Hive and >> >>sends >> >> alerts when anomalous behaviour is detected. Eagle can also import >> >> sensitive data information classified by external classification >> >>engines to >> >> help define its policies. >> >> >> >> Overview of Eagle >> >> Eagle has 3 main parts. >> >> 1.Data collection and storage - Eagle collects data from various hadoop >> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >> >>storage. >> >> 2.Data processing and policy engine - Eagle allows users to create >> >> policies based on various metadata properties on HDFS, Hive and HBase >> >>data. >> >> 3.Eagle services - Eagle services include policy manager, query service >> >> and the visualization component. Eagle provides intuitive user >> >>interface to >> >> administer Eagle and an alert dashboard to respond to real time alerts. >> >> >> >> Data Collection and Storage: >> >> Eagle provides programming API for extending Eagle to integrate any data >> >> source into Eagle policy evaluation framework. For example, Eagle hdfs >> >> audit monitoring collects data from Kafka which is populated from >> >>namenode >> >> log4j appender or from logstash agent. Eagle hive monitoring collects >> >>hive >> >> query logs from running job through YARN API, which is designed to be >> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing >> >> metadata and metrics data, and also supports relational database through >> >> configuration change. >> >> >> >> Data Processing and Policy Engine: >> >> Processing Engine: Eagle provides stream processing API which is an >> >> abstraction of Apache Storm. It can also be extended to other streaming >> >> engines. This abstraction allows developers to assemble data >> >> transformation, filtering, external data join etc. without physically >> >>bound >> >> to a specific streaming platform. Eagle streaming API allows developers >> >>to >> >> easily integrate business logic with Eagle policy engine and internally >> >> Eagle framework compiles business logic execution DAG into program >> >> primitives of underlying stream infrastructure e.g. Apache Storm. For >> >> example, Eagle HDFS monitoring transforms audit
[DISCUSS] Eagle incubator proposal
Hello Everyone, My name is Arun Manoharan. Currently a product manager in the Analytics platform team at eBay Inc. I would like to start a discussion on Eagle and its joining the ASF as an incubation project. Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Eagle supports a wide variety of policies on HDFS data and Hive. Eagle also provides machine learning models for detecting anomalous user behavior in Hadoop. The proposal is available on the wiki here: https://wiki.apache.org/incubator/EagleProposal The text of the proposal is also available at the end of this email. Thanks for your time and help. Thanks, Arun Eagle Abstract Eagle is an Open Source Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities in hadoop and take actions. Proposal Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks user’s access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behaviour for HDFS and Hive and sends alerts when anomalous behaviour is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Overview of Eagle Eagle has 3 main parts. 1.Data collection and storage - Eagle collects data from various hadoop logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. 2.Data processing and policy engine - Eagle allows users to create policies based on various metadata properties on HDFS, Hive and HBase data. 3.Eagle services - Eagle services include policy manager, query service and the visualization component. Eagle provides intuitive user interface to administer Eagle and an alert dashboard to respond to real time alerts. Data Collection and Storage: Eagle provides programming API for extending Eagle to integrate any data source into Eagle policy evaluation framework. For example, Eagle hdfs audit monitoring collects data from Kafka which is populated from namenode log4j appender or from logstash agent. Eagle hive monitoring collects hive query logs from running job through YARN API, which is designed to be scalable and fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics data, and also supports relational database through configuration change. Data Processing and Policy Engine: Processing Engine: Eagle provides stream processing API which is an abstraction of Apache Storm. It can also be extended to other streaming engines. This abstraction allows developers to assemble data transformation, filtering, external data join etc. without physically bound to a specific streaming platform. Eagle streaming API allows developers to easily integrate business logic with Eagle policy engine and internally Eagle framework compiles business logic execution DAG into program primitives of underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from Namenode to object and joins sensitivity metadata, security zone metadata which are generated from external programs or configured by user. Eagle hive monitoring filters running jobs to get hive query string and parses query string into object and then joins sensitivity metadata. Alerting Framework: Eagle Alert Framework includes stream metadata API, scalable policy engine framework, extensible policy engine framework. Stream metadata API allows developers to declare event schema including what attributes constitute an event, what is the type for each attribute, and how to dynamically resolve attribute value in runtime when user configures policy. Scalable policy engine framework allows policies to be executed on different physical nodes in parallel. It is also used to define your own policy partitioner class. Policy engine framework together with streaming partitioning capability provided by all streaming platforms will make sure policies and events can be evaluated in a fully distributed way. Extensible policy engine framework allows developer to plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports as first-class citizen. Machine Learning module: Eagle provides capabilities to define user activity patterns or user profiles for Hadoop users based on the user behaviour in the platform. These user profiles are modeled using Machine Learning algorithms and used for detection of anomalous users activities. Eagle uses Eigen Value Decomposition, and Density Estimation algorithms for generating user profile models. The model reads data from HDFS audit logs, preprocesses and aggregates data, and generates models using Spark programming APIs. Once models are generated, Eagle uses stream
Re: [DISCUSS] Eagle incubator proposal
Hi Arun, very interesting proposal. I may see some possible interaction with Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a kind of Change Data Capture), etc. So, I see a different perspective in Eagle, but Eagle could also leverage Falcon somehow. Regards JB On 10/19/2015 05:33 PM, Manoharan, Arun wrote: Hello Everyone, My name is Arun Manoharan. Currently a product manager in the Analytics platform team at eBay Inc. I would like to start a discussion on Eagle and its joining the ASF as an incubation project. Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Eagle supports a wide variety of policies on HDFS data and Hive. Eagle also provides machine learning models for detecting anomalous user behavior in Hadoop. The proposal is available on the wiki here: https://wiki.apache.org/incubator/EagleProposal The text of the proposal is also available at the end of this email. Thanks for your time and help. Thanks, Arun Eagle Abstract Eagle is an Open Source Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities in hadoop and take actions. Proposal Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks user’s access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behaviour for HDFS and Hive and sends alerts when anomalous behaviour is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Overview of Eagle Eagle has 3 main parts. 1.Data collection and storage - Eagle collects data from various hadoop logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. 2.Data processing and policy engine - Eagle allows users to create policies based on various metadata properties on HDFS, Hive and HBase data. 3.Eagle services - Eagle services include policy manager, query service and the visualization component. Eagle provides intuitive user interface to administer Eagle and an alert dashboard to respond to real time alerts. Data Collection and Storage: Eagle provides programming API for extending Eagle to integrate any data source into Eagle policy evaluation framework. For example, Eagle hdfs audit monitoring collects data from Kafka which is populated from namenode log4j appender or from logstash agent. Eagle hive monitoring collects hive query logs from running job through YARN API, which is designed to be scalable and fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics data, and also supports relational database through configuration change. Data Processing and Policy Engine: Processing Engine: Eagle provides stream processing API which is an abstraction of Apache Storm. It can also be extended to other streaming engines. This abstraction allows developers to assemble data transformation, filtering, external data join etc. without physically bound to a specific streaming platform. Eagle streaming API allows developers to easily integrate business logic with Eagle policy engine and internally Eagle framework compiles business logic execution DAG into program primitives of underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from Namenode to object and joins sensitivity metadata, security zone metadata which are generated from external programs or configured by user. Eagle hive monitoring filters running jobs to get hive query string and parses query string into object and then joins sensitivity metadata. Alerting Framework: Eagle Alert Framework includes stream metadata API, scalable policy engine framework, extensible policy engine framework. Stream metadata API allows developers to declare event schema including what attributes constitute an event, what is the type for each attribute, and how to dynamically resolve attribute value in runtime when user configures policy. Scalable policy engine framework allows policies to be executed on different physical nodes in parallel. It is also used to define your own policy partitioner class. Policy engine framework together with streaming partitioning capability provided by all streaming platforms will make sure policies and events can be evaluated in a fully distributed way. Extensible policy engine framework allows developer to plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports as first-class citizen. Machine Learning module: Eagle provides capabilities to define user activity patterns or user profiles for Hadoop users based on the user behaviour in the platform. These user profiles are modeled using Machine Learning
Re: [DISCUSS] Eagle incubator proposal
I would suggest that Owen O'Malley has not had enough time to be a viable mentor recently and should not be on the list of mentors. Henry and Julian are good if their schedules permit. Henry, I know has been mentoring a number of projects lately. On Mon, Oct 19, 2015 at 8:40 AM, Jean-Baptiste Onofréwrote: > Hi Arun, > > very interesting proposal. I may see some possible interaction with > Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a > kind of Change Data Capture), etc. > > So, I see a different perspective in Eagle, but Eagle could also leverage > Falcon somehow. > > Regards > JB > > > On 10/19/2015 05:33 PM, Manoharan, Arun wrote: > >> Hello Everyone, >> >> My name is Arun Manoharan. Currently a product manager in the Analytics >> platform team at eBay Inc. >> >> I would like to start a discussion on Eagle and its joining the ASF as an >> incubation project. >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access to >> sensitive data, recognize attacks, malicious activities and take actions in >> real time. Eagle supports a wide variety of policies on HDFS data and Hive. >> Eagle also provides machine learning models for detecting anomalous user >> behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >> identify access to sensitive data, recognize attacks, malicious activities >> in hadoop and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >> enforces policies defined on sensitive data access and alerts or blocks >> user’s access to that sensitive data in real time. Eagle also creates user >> profiles based on the typical access behaviour for HDFS and Hive and sends >> alerts when anomalous behaviour is detected. Eagle can also import >> sensitive data information classified by external classification engines to >> help define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. >> 2.Data processing and policy engine - Eagle allows users to create >> policies based on various metadata properties on HDFS, Hive and HBase data. >> 3.Eagle services - Eagle services include policy manager, query service >> and the visualization component. Eagle provides intuitive user interface to >> administer Eagle and an alert dashboard to respond to real time alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any data >> source into Eagle policy evaluation framework. For example, Eagle hdfs >> audit monitoring collects data from Kafka which is populated from namenode >> log4j appender or from logstash agent. Eagle hive monitoring collects hive >> query logs from running job through YARN API, which is designed to be >> scalable and fault-tolerant. Eagle uses HBase as storage for storing >> metadata and metrics data, and also supports relational database through >> configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >> abstraction of Apache Storm. It can also be extended to other streaming >> engines. This abstraction allows developers to assemble data >> transformation, filtering, external data join etc. without physically bound >> to a specific streaming platform. Eagle streaming API allows developers to >> easily integrate business logic with Eagle policy engine and internally >> Eagle framework compiles business logic execution DAG into program >> primitives of underlying stream infrastructure e.g. Apache Storm. For >> example, Eagle HDFS monitoring transforms audit log from Namenode to object >> and joins sensitivity metadata, security zone metadata which are generated >> from external programs or configured by user. Eagle hive monitoring filters >> running jobs to get hive query string and parses query string into object >> and then joins sensitivity metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >> scalable policy engine framework, extensible policy engine framework. >> Stream metadata API allows developers to declare event schema including >> what attributes constitute an event, what is the type for each attribute, >> and how to dynamically resolve attribute value in runtime when user >> configures policy. Scalable policy engine framework allows policies to be >> executed on different physical nodes in parallel. It is also used to define >> your own policy partitioner class. Policy engine framework together with >> streaming partitioning
Re: [DISCUSS] Eagle incubator proposal
Hi JB, That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase data changes, so this feature would complement Eagle which today mainly focuses on HDFS/Hive/HBase data access including view, change, delete etc. Eagle would benefit if Eagle can instantly capture data change from Falcon. Thanks Edward Zhang On 10/19/15, 8:40, "Jean-Baptiste Onofré"wrote: >Hi Arun, > >very interesting proposal. I may see some possible interaction with >Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with >a kind of Change Data Capture), etc. > >So, I see a different perspective in Eagle, but Eagle could also >leverage Falcon somehow. > >Regards >JB > >On 10/19/2015 05:33 PM, Manoharan, Arun wrote: >> Hello Everyone, >> >> My name is Arun Manoharan. Currently a product manager in the Analytics >>platform team at eBay Inc. >> >> I would like to start a discussion on Eagle and its joining the ASF as >>an incubation project. >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access >>to sensitive data, recognize attacks, malicious activities and take >>actions in real time. Eagle supports a wide variety of policies on HDFS >>data and Hive. Eagle also provides machine learning models for detecting >>anomalous user behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>identify access to sensitive data, recognize attacks, malicious >>activities in hadoop and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>enforces policies defined on sensitive data access and alerts or blocks >>user¹s access to that sensitive data in real time. Eagle also creates >>user profiles based on the typical access behaviour for HDFS and Hive >>and sends alerts when anomalous behaviour is detected. Eagle can also >>import sensitive data information classified by external classification >>engines to help define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>storage. >> 2.Data processing and policy engine - Eagle allows users to create >>policies based on various metadata properties on HDFS, Hive and HBase >>data. >> 3.Eagle services - Eagle services include policy manager, query service >>and the visualization component. Eagle provides intuitive user interface >>to administer Eagle and an alert dashboard to respond to real time >>alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any >>data source into Eagle policy evaluation framework. For example, Eagle >>hdfs audit monitoring collects data from Kafka which is populated from >>namenode log4j appender or from logstash agent. Eagle hive monitoring >>collects hive query logs from running job through YARN API, which is >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage >>for storing metadata and metrics data, and also supports relational >>database through configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >>abstraction of Apache Storm. It can also be extended to other streaming >>engines. This abstraction allows developers to assemble data >>transformation, filtering, external data join etc. without physically >>bound to a specific streaming platform. Eagle streaming API allows >>developers to easily integrate business logic with Eagle policy engine >>and internally Eagle framework compiles business logic execution DAG >>into program primitives of underlying stream infrastructure e.g. Apache >>Storm. For example, Eagle HDFS monitoring transforms audit log from >>Namenode to object and joins sensitivity metadata, security zone >>metadata which are generated from external programs or configured by >>user. Eagle hive monitoring filters running jobs to get hive query >>string and parses query string into object and then joins sensitivity >>metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >>scalable policy engine framework, extensible policy engine framework. >>Stream metadata API allows developers to declare event schema including >>what attributes constitute an event, what is the type for each >>attribute, and how to dynamically resolve attribute value in runtime >>when user configures policy. Scalable policy engine framework allows >>policies to be executed on different physical nodes in parallel. It is >>also used to define your own policy partitioner class. Policy engine >>framework together with streaming
Re: [DISCUSS] Eagle incubator proposal
It makes sense. I will try to contribute on this ;) Regards JB On 10/19/2015 09:46 PM, Zhang, Edward (GDI Hadoop) wrote: Hi JB, That is a good Point. Good to know that Falcon feeds HDFS/Hive/HBase data changes, so this feature would complement Eagle which today mainly focuses on HDFS/Hive/HBase data access including view, change, delete etc. Eagle would benefit if Eagle can instantly capture data change from Falcon. Thanks Edward Zhang On 10/19/15, 8:40, "Jean-Baptiste Onofré"wrote: Hi Arun, very interesting proposal. I may see some possible interaction with Falcon. In Falcon, we have HDFS files (and Hive/HBase) monitoring (with a kind of Change Data Capture), etc. So, I see a different perspective in Eagle, but Eagle could also leverage Falcon somehow. Regards JB On 10/19/2015 05:33 PM, Manoharan, Arun wrote: Hello Everyone, My name is Arun Manoharan. Currently a product manager in the Analytics platform team at eBay Inc. I would like to start a discussion on Eagle and its joining the ASF as an incubation project. Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Eagle supports a wide variety of policies on HDFS data and Hive. Eagle also provides machine learning models for detecting anomalous user behavior in Hadoop. The proposal is available on the wiki here: https://wiki.apache.org/incubator/EagleProposal The text of the proposal is also available at the end of this email. Thanks for your time and help. Thanks, Arun Eagle Abstract Eagle is an Open Source Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities in hadoop and take actions. Proposal Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks user¹s access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behaviour for HDFS and Hive and sends alerts when anomalous behaviour is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Overview of Eagle Eagle has 3 main parts. 1.Data collection and storage - Eagle collects data from various hadoop logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage. 2.Data processing and policy engine - Eagle allows users to create policies based on various metadata properties on HDFS, Hive and HBase data. 3.Eagle services - Eagle services include policy manager, query service and the visualization component. Eagle provides intuitive user interface to administer Eagle and an alert dashboard to respond to real time alerts. Data Collection and Storage: Eagle provides programming API for extending Eagle to integrate any data source into Eagle policy evaluation framework. For example, Eagle hdfs audit monitoring collects data from Kafka which is populated from namenode log4j appender or from logstash agent. Eagle hive monitoring collects hive query logs from running job through YARN API, which is designed to be scalable and fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics data, and also supports relational database through configuration change. Data Processing and Policy Engine: Processing Engine: Eagle provides stream processing API which is an abstraction of Apache Storm. It can also be extended to other streaming engines. This abstraction allows developers to assemble data transformation, filtering, external data join etc. without physically bound to a specific streaming platform. Eagle streaming API allows developers to easily integrate business logic with Eagle policy engine and internally Eagle framework compiles business logic execution DAG into program primitives of underlying stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from Namenode to object and joins sensitivity metadata, security zone metadata which are generated from external programs or configured by user. Eagle hive monitoring filters running jobs to get hive query string and parses query string into object and then joins sensitivity metadata. Alerting Framework: Eagle Alert Framework includes stream metadata API, scalable policy engine framework, extensible policy engine framework. Stream metadata API allows developers to declare event schema including what attributes constitute an event, what is the type for each attribute, and how to dynamically resolve attribute value in runtime when user configures policy. Scalable policy engine framework allows policies to be executed on different physical nodes in parallel. It is also used to define your own policy partitioner class. Policy engine framework together with streaming partitioning capability provided by all streaming platforms will make sure policies and events can be