Re: Not just yet

2015-10-23 Thread Sam Ruby
On Fri, Oct 23, 2015 at 2:32 AM, Greg Stein  wrote:
> On Oct 23, 2015 1:01 AM, "Sam Ruby"  wrote:
>>
>> On Thu, Oct 22, 2015 at 10:17 PM, Greg Stein  wrote:
>> > On Oct 22, 2015 9:57 AM, "Sam Ruby"  wrote:
>> >>...
>> >> I've also opened another issue that I would appreciate feedback on:
>> >>
>> >> https://issues.apache.org/jira/browse/WHIMSY-28
>> >
>> > Cross-check is nice.
>> >
>> > I'd suggest another possibility: a web tool to *add* a template-based
>> > resolution in the first place.
>>
>> A.K.A.  A wizard or assistant[1].  Good idea, and easy peasy to
>> implement.
>
> Yup. With the agenda editing bits you've developed, I figured you could do
> this with your morning coffee :-)
>
>> I can start with these:
>>
>> https://svn.apache.org/repos/private/committers/board/templates/
>
> Yes.
>
>> The only thing that looks moderately challenging from a UI perspective
>> is allowing the entry of a list of users.  I may just go with a
>> textarea as people generally have a resolution that they have voted on
>> prior to posting a board resolution.
>
> Good point. But in that whimsy issue, you mentioned checking the list of
> names/emails, so you've already got some processing for that textarea.

Indeed.

I'm a bit bothered by the fact that this would mean that the community
voted on one resolution and then a different resolution is constructed
anew for the board to approve.  But at the moment, I'm not sure how to
solve that problem.

> Cheers,
> -g

- Sam Ruby

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Short form IP clearance

2015-10-23 Thread Jim Jagielski

> On Oct 22, 2015, at 7:13 PM, P. Taylor Goetz  wrote:
> 
> Again my apologies for polluting this thread with tangential thoughts.
> 
> Maybe I should start a new thread: "Is IP Clearance Optional?"
> 

That would be a short one. My response would be No. :)


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Short form IP clearance

2015-10-23 Thread John D. Ament
I think probably the better question is "which contributions require IP
Clearance"?

On Fri, Oct 23, 2015 at 7:19 AM Jim Jagielski  wrote:

>
> > On Oct 22, 2015, at 7:13 PM, P. Taylor Goetz  wrote:
> >
> > Again my apologies for polluting this thread with tangential thoughts.
> >
> > Maybe I should start a new thread: "Is IP Clearance Optional?"
> >
>
> That would be a short one. My response would be No. :)
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Short form IP clearance

2015-10-23 Thread Sam Ruby
On Fri, Oct 23, 2015 at 7:41 AM, John D. Ament  wrote:
> I think probably the better question is "which contributions require IP
> Clearance"?

"Any code that was developed outside of the ASF SVN repository and our
public mailing lists must be processed like this, even if the external
developer is already an ASF committer."

Source: http://incubator.apache.org/ip-clearance/

At a minimum, the word "SVN" should be removed.  Any other changes
people feel are necessary?

- Sam Ruby

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Short form IP clearance

2015-10-23 Thread John D. Ament
So basically if someone attaches a patch to a JIRA, which becomes part
of our public mailing lists, we're good?

Would github PR's fall under the same premise, since the contents of those
mails become public record?

On Fri, Oct 23, 2015 at 7:51 AM Sam Ruby  wrote:

> On Fri, Oct 23, 2015 at 7:41 AM, John D. Ament 
> wrote:
> > I think probably the better question is "which contributions require IP
> > Clearance"?
>
> "Any code that was developed outside of the ASF SVN repository and our
> public mailing lists must be processed like this, even if the external
> developer is already an ASF committer."
>
> Source: http://incubator.apache.org/ip-clearance/
>
> At a minimum, the word "SVN" should be removed.  Any other changes
> people feel are necessary?
>
> - Sam Ruby
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Short form IP clearance

2015-10-23 Thread Sam Ruby
On Fri, Oct 23, 2015 at 7:55 AM, John D. Ament  wrote:
> So basically if someone attaches a patch to a JIRA, which becomes part
> of our public mailing lists, we're good?

If that code was all that poster's original work, not previously
published elsewhere (particularly under a different license), then
that's fine.

If that code was previously developed elsewhere, had multiple
contributors, and made available (particularly under a different
license), then no.

I'll also note that the goal of IP Clearance isn't to disallow such
contributions, it is merely to make sure that we capture all of the
relevant history (IP provenance) for posterity.

> Would github PR's fall under the same premise, since the contents of those
> mails become public record?

See above.

- Sam Ruby

> On Fri, Oct 23, 2015 at 7:51 AM Sam Ruby  wrote:
>
>> On Fri, Oct 23, 2015 at 7:41 AM, John D. Ament 
>> wrote:
>> > I think probably the better question is "which contributions require IP
>> > Clearance"?
>>
>> "Any code that was developed outside of the ASF SVN repository and our
>> public mailing lists must be processed like this, even if the external
>> developer is already an ASF committer."
>>
>> Source: http://incubator.apache.org/ip-clearance/
>>
>> At a minimum, the word "SVN" should be removed.  Any other changes
>> people feel are necessary?
>>
>> - Sam Ruby
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Not just yet

2015-10-23 Thread Greg Stein
On Fri, Oct 23, 2015 at 2:15 AM, Sam Ruby  wrote:
>...

> I'm a bit bothered by the fact that this would mean that the community
> voted on one resolution and then a different resolution is constructed
> anew for the board to approve.  But at the moment, I'm not sure how to
> solve that problem.
>

New workflow:

1. $somebody uses whimsy to post a resolution
2. commit message is forwarded to $list
3. community sees $resolution

I believe the question is "does the commit message reach the appropriate
audience?"

Cheers,
-g


[VOTE] TinkerPop 3.0.2-incubating Release

2015-10-23 Thread Stephen Mallette
Hello,
We are happy to announce that TinkerPop 3.0.2-incubating is ready for release.

The release artifacts can be found at this location:
https://dist.apache.org/repos/dist/dev/incubator/tinkerpop/3.0.2-incubating/

The source distribution is provided by:
apache-tinkerpop-3.0.2-incubating-source-release.zip

Two binary distributions are provided for user convenience:
apache-gremlin-console-3.0.2-incubating-distribution.zip
apache-gremlin-server-3.0.2-incubating-distribution.zip

The GPG key used to sign the release artifacts is available at:
https://dist.apache.org/repos/dist/dev/incubator/tinkerpop/KEYS

The online docs can be found here:
http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/ (user docs)

http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/upgrade.html#_tinkerpop_3_0_2
(upgrade docs)
http://tinkerpop.incubator.apache.org/javadocs/3.0.2-incubating/core/
(core javadoc)
http://tinkerpop.incubator.apache.org/javadocs/3.0.2-incubating/full/
(full javadoc)

The tag in Apache Git can be found here:

https://git-wip-us.apache.org/repos/asf?p=incubator-tinkerpop.git;a=tag;h=8e9af13d6beb184a137067caa0445157351435ab

The release notes are available here:

https://github.com/apache/incubator-tinkerpop/blob/3.0.2-incubating/CHANGELOG.asciidoc#tinkerpop-302-release-date-october-19-2015

Finally, the dev@tinkerpop [VOTE] thread can be found at this location:


http://mail-archives.apache.org/mod_mbox/incubator-tinkerpop-dev/201510.mbox/%3CCAA-H439qBNzu1gO7P0m%2BeUQ4OZyPU9Ya2D1icTEftd3fpzNvrA%40mail.gmail.com%3E

Result summary: +14 (4 binding, 10 non-binding), 0 (0), -1 (0)

The [VOTE] will be open for the next 72 hours --- closing Monday
(October 26, 2015) at 8am EST.

Thanks,

Stephen

P.S. Hopefully we were able to get LICENSE/NOTICE solid on this
release.  Justin Mclean I hope you get a chance to verify and vote as
you usually seem to do.


Re: Short form IP clearance

2015-10-23 Thread John D. Ament
On Fri, Oct 23, 2015 at 8:03 AM Sam Ruby  wrote:

> On Fri, Oct 23, 2015 at 7:55 AM, John D. Ament 
> wrote:
> > So basically if someone attaches a patch to a JIRA, which becomes
> part
> > of our public mailing lists, we're good?
>
> If that code was all that poster's original work, not previously
> published elsewhere (particularly under a different license), then
> that's fine.
>
> If that code was previously developed elsewhere, had multiple
> contributors, and made available (particularly under a different
> license), then no.
>
> I'll also note that the goal of IP Clearance isn't to disallow such
> contributions, it is merely to make sure that we capture all of the
> relevant history (IP provenance) for posterity.
>


So then probably the more concise form is "the work was first developed as
a part of the ASF and not elsewhere."

A cursory read makes it sound partially like external contributions require
IP Clearance, but having this note makes it clearer.


>
> > Would github PR's fall under the same premise, since the contents of
> those
> > mails become public record?
>
> See above.
>
> - Sam Ruby
>
> > On Fri, Oct 23, 2015 at 7:51 AM Sam Ruby  wrote:
> >
> >> On Fri, Oct 23, 2015 at 7:41 AM, John D. Ament 
> >> wrote:
> >> > I think probably the better question is "which contributions require
> IP
> >> > Clearance"?
> >>
> >> "Any code that was developed outside of the ASF SVN repository and our
> >> public mailing lists must be processed like this, even if the external
> >> developer is already an ASF committer."
> >>
> >> Source: http://incubator.apache.org/ip-clearance/
> >>
> >> At a minimum, the word "SVN" should be removed.  Any other changes
> >> people feel are necessary?
> >>
> >> - Sam Ruby
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Not just yet

2015-10-23 Thread John D. Ament
On Fri, Oct 23, 2015 at 8:07 AM Greg Stein  wrote:

> On Fri, Oct 23, 2015 at 2:15 AM, Sam Ruby  wrote:
> >...
>
> > I'm a bit bothered by the fact that this would mean that the community
> > voted on one resolution and then a different resolution is constructed
> > anew for the board to approve.  But at the moment, I'm not sure how to
> > solve that problem.
> >
>
> New workflow:
>
> 1. $somebody uses whimsy to post a resolution
> 2. commit message is forwarded to $list
> 3. community sees $resolution
>
> I believe the question is "does the commit message reach the appropriate
> audience?"
>

Maybe it stays in a waiting area until approved by the community?


>
> Cheers,
> -g
>


Re: Not just yet

2015-10-23 Thread Sam Ruby
On Fri, Oct 23, 2015 at 8:09 AM, John D. Ament  wrote:
> On Fri, Oct 23, 2015 at 8:07 AM Greg Stein  wrote:
>
>> On Fri, Oct 23, 2015 at 2:15 AM, Sam Ruby  wrote:
>> >...
>>
>> > I'm a bit bothered by the fact that this would mean that the community
>> > voted on one resolution and then a different resolution is constructed
>> > anew for the board to approve.  But at the moment, I'm not sure how to
>> > solve that problem.
>> >
>>
>> New workflow:
>>
>> 1. $somebody uses whimsy to post a resolution
>> 2. commit message is forwarded to $list
>> 3. community sees $resolution
>>
>> I believe the question is "does the commit message reach the appropriate
>> audience?"
>>
>
> Maybe it stays in a waiting area until approved by the community?

There's an idea.

I prefer to focus on defining a process with whimsy in mind, and then
add support for making it easier to do something that could also be
done outside of whimsy.

Applied here, there could be a simple directory in
https://svn.apache.org/repos/private/committers of draft resolutions.
The board agenda tool could have one or more forms forms that assist
with drafting resolutions and place them in that directory.  People
could update those resolutions directory, and perhaps the tool could
also be used for update.

The board agenda tool could add support for posting a draft resolution
found in this directory to this month's agenda.

>> Cheers,
>> -g

- Sam Ruby

P.S.  Perhaps it is time for a different subject line and/or move to
dev@whimsical?

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Manoharan, Arun
Hello Everyone,

Thanks for all the feedback on the Eagle Proposal.

I would like to call for a [VOTE] on Eagle joining the ASF as an incubation 
project.

The vote is open for 72 hours:

[ ] +1 accept Eagle in the Incubator
[ ] ±0
[ ] -1 (please give reason)

Eagle is a Monitoring solution for Hadoop to instantly identify access to 
sensitive data, recognize attacks, malicious activities and take actions in 
real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
Eagle also provides machine learning models for detecting anomalous user 
behavior in Hadoop.

The proposal is available on the wiki here:
https://wiki.apache.org/incubator/EagleProposal

The text of the proposal is also available at the end of this email.

Thanks for your time and help.

Thanks,
Arun



Eagle

Abstract
Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
access to sensitive data, recognize attacks, malicious activities in hadoop and 
take actions.

Proposal
Eagle audits access to HDFS files, Hive and HBase tables in real time, enforces 
policies defined on sensitive data access and alerts or blocks user’s access to 
that sensitive data in real time. Eagle also creates user profiles based on the 
typical access behaviour for HDFS and Hive and sends alerts when anomalous 
behaviour is detected. Eagle can also import sensitive data information 
classified by external classification engines to help define its policies.

Overview of Eagle
Eagle has 3 main parts.
1.Data collection and storage - Eagle collects data from various hadoop logs in 
real time using Kafka/Yarn API and uses HDFS and HBase for storage.
2.Data processing and policy engine - Eagle allows users to create policies 
based on various metadata properties on HDFS, Hive and HBase data.
3.Eagle services - Eagle services include policy manager, query service and the 
visualization component. Eagle provides intuitive user interface to administer 
Eagle and an alert dashboard to respond to real time alerts.

Data Collection and Storage:
Eagle provides programming API for extending Eagle to integrate any data source 
into Eagle policy evaluation framework. For example, Eagle hdfs audit 
monitoring collects data from Kafka which is populated from namenode log4j 
appender or from logstash agent. Eagle hive monitoring collects hive query logs 
from running job through YARN API, which is designed to be scalable and 
fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
data, and also supports relational database through configuration change.

Data Processing and Policy Engine:
Processing Engine: Eagle provides stream processing API which is an abstraction 
of Apache Storm. It can also be extended to other streaming engines. This 
abstraction allows developers to assemble data transformation, filtering, 
external data join etc. without physically bound to a specific streaming 
platform. Eagle streaming API allows developers to easily integrate business 
logic with Eagle policy engine and internally Eagle framework compiles business 
logic execution DAG into program primitives of underlying stream infrastructure 
e.g. Apache Storm. For example, Eagle HDFS monitoring transforms audit log from 
Namenode to object and joins sensitivity metadata, security zone metadata which 
are generated from external programs or configured by user. Eagle hive 
monitoring filters running jobs to get hive query string and parses query 
string into object and then joins sensitivity metadata.
Alerting Framework: Eagle Alert Framework includes stream metadata API, 
scalable policy engine framework, extensible policy engine framework. Stream 
metadata API allows developers to declare event schema including what 
attributes constitute an event, what is the type for each attribute, and how to 
dynamically resolve attribute value in runtime when user configures policy. 
Scalable policy engine framework allows policies to be executed on different 
physical nodes in parallel. It is also used to define your own policy 
partitioner class. Policy engine framework together with streaming partitioning 
capability provided by all streaming platforms will make sure policies and 
events can be evaluated in a fully distributed way. Extensible policy engine 
framework allows developer to plugin a new policy engine with a few lines of 
codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports as 
first-class citizen.
Machine Learning module: Eagle provides capabilities to define user activity 
patterns or user profiles for Hadoop users based on the user behaviour in the 
platform. These user profiles are modeled using Machine Learning algorithms and 
used for detection of anomalous users activities. Eagle uses Eigen Value 
Decomposition, and Density Estimation algorithms for generating user profile 
models. The model reads data from HDFS audit logs, preprocesses and aggregates 
data, and generates models using Spark programming APIs. Onc

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread larry mccay
+1 (non-binding)

On Fri, Oct 23, 2015 at 7:11 AM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection of anomalous users activities.
> Eagle uses Eigen Value Decomposi

Re: [VOTE] Release Apache Kylin 1.1-incubating (rc1)

2015-10-23 Thread ShaoFeng Shi
Thanks Justin and everyone! I will send out the vote result soon;

2015-10-23 13:38 GMT+08:00 Justin Mclean :

> Hi,
>
> > Would you mind to change your vote to release this candidate?
>
> You can release it without me changing my vote, there are no veto on
> releases.
>
> But as you asked and it's been dealt with my vote is +1 (binding).
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Best regards,

Shaofeng Shi


Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Henry Saputra
+1 (binding)

On Fri, Oct 23, 2015 at 7:11 AM, Manoharan, Arun  wrote:
> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an incubation 
> project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
> sensitive data, recognize attacks, malicious activities and take actions in 
> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
> Eagle also provides machine learning models for detecting anomalous user 
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
> access to sensitive data, recognize attacks, malicious activities in hadoop 
> and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
> enforces policies defined on sensitive data access and alerts or blocks 
> user’s access to that sensitive data in real time. Eagle also creates user 
> profiles based on the typical access behaviour for HDFS and Hive and sends 
> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
> data information classified by external classification engines to help define 
> its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop logs 
> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create policies 
> based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service and 
> the visualization component. Eagle provides intuitive user interface to 
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data 
> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
> monitoring collects data from Kafka which is populated from namenode log4j 
> appender or from logstash agent. Eagle hive monitoring collects hive query 
> logs from running job through YARN API, which is designed to be scalable and 
> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
> data, and also supports relational database through configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an 
> abstraction of Apache Storm. It can also be extended to other streaming 
> engines. This abstraction allows developers to assemble data transformation, 
> filtering, external data join etc. without physically bound to a specific 
> streaming platform. Eagle streaming API allows developers to easily integrate 
> business logic with Eagle policy engine and internally Eagle framework 
> compiles business logic execution DAG into program primitives of underlying 
> stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
> transforms audit log from Namenode to object and joins sensitivity metadata, 
> security zone metadata which are generated from external programs or 
> configured by user. Eagle hive monitoring filters running jobs to get hive 
> query string and parses query string into object and then joins sensitivity 
> metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
> scalable policy engine framework, extensible policy engine framework. Stream 
> metadata API allows developers to declare event schema including what 
> attributes constitute an event, what is the type for each attribute, and how 
> to dynamically resolve attribute value in runtime when user configures 
> policy. Scalable policy engine framework allows policies to be executed on 
> different physical nodes in parallel. It is also used to define your own 
> policy partitioner class. Policy engine framework together with streaming 
> partitioning capability provided by all streaming platforms will make sure 
> policies and events can be evaluated in a fully distributed way. Extensible 
> policy engine framework allows developer to plugin a new policy engine with a 
> few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
> supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user activity 
> patterns or user profiles for Hadoop users based on the user behaviour in the 
> platform. These user profiles are modeled using Machine Learning algorithms 
> and used for detection of anomalous users act

[RESULT] [VOTE] Release Apache Kylin 1.1-incubating (rc1)

2015-10-23 Thread ShaoFeng Shi
This vote passes with 4 binding +1s and 6 no-binding +1s, no 0 or -1 votes:

+1 Julian Hyde (binding)
+1 Henry Saputra (binding)
+1 Taylor Goetz (binding)
+1 Justin Mclean (binding)
+1 Luke Han
+1 Li Yang
+1 Hongbin Ma
+1 Qiaohao Zhou
+1 Samant, Medha
+1 Shaofeng Shi

Thanks everyone. We’ll now roll the release out to the mirrors.

Shaofeng Shi, on behalf of Apache Kylin PPMC
shaofeng...@apache.org

2015-10-23 23:01 GMT+08:00 ShaoFeng Shi :

> Thanks Justin and everyone! I will send out the vote result soon;
>
> 2015-10-23 13:38 GMT+08:00 Justin Mclean :
>
>> Hi,
>>
>> > Would you mind to change your vote to release this candidate?
>>
>> You can release it without me changing my vote, there are no veto on
>> releases.
>>
>> But as you asked and it's been dealt with my vote is +1 (binding).
>>
>> Thanks,
>> Justin
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>


-- 
Best regards,

Shaofeng Shi


Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread P. Taylor Goetz
+1 (binding)

-Taylor

> On Oct 23, 2015, at 10:11 AM, Manoharan, Arun  wrote:
> 
> Hello Everyone,
> 
> Thanks for all the feedback on the Eagle Proposal.
> 
> I would like to call for a [VOTE] on Eagle joining the ASF as an incubation 
> project.
> 
> The vote is open for 72 hours:
> 
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
> 
> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
> sensitive data, recognize attacks, malicious activities and take actions in 
> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
> Eagle also provides machine learning models for detecting anomalous user 
> behavior in Hadoop.
> 
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
> 
> The text of the proposal is also available at the end of this email.
> 
> Thanks for your time and help.
> 
> Thanks,
> Arun
> 
> 
> 
> Eagle
> 
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
> access to sensitive data, recognize attacks, malicious activities in hadoop 
> and take actions.
> 
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
> enforces policies defined on sensitive data access and alerts or blocks 
> user’s access to that sensitive data in real time. Eagle also creates user 
> profiles based on the typical access behaviour for HDFS and Hive and sends 
> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
> data information classified by external classification engines to help define 
> its policies.
> 
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop logs 
> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create policies 
> based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service and 
> the visualization component. Eagle provides intuitive user interface to 
> administer Eagle and an alert dashboard to respond to real time alerts.
> 
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data 
> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
> monitoring collects data from Kafka which is populated from namenode log4j 
> appender or from logstash agent. Eagle hive monitoring collects hive query 
> logs from running job through YARN API, which is designed to be scalable and 
> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
> data, and also supports relational database through configuration change.
> 
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an 
> abstraction of Apache Storm. It can also be extended to other streaming 
> engines. This abstraction allows developers to assemble data transformation, 
> filtering, external data join etc. without physically bound to a specific 
> streaming platform. Eagle streaming API allows developers to easily integrate 
> business logic with Eagle policy engine and internally Eagle framework 
> compiles business logic execution DAG into program primitives of underlying 
> stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
> transforms audit log from Namenode to object and joins sensitivity metadata, 
> security zone metadata which are generated from external programs or 
> configured by user. Eagle hive monitoring filters running jobs to get hive 
> query string and parses query string into object and then joins sensitivity 
> metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
> scalable policy engine framework, extensible policy engine framework. Stream 
> metadata API allows developers to declare event schema including what 
> attributes constitute an event, what is the type for each attribute, and how 
> to dynamically resolve attribute value in runtime when user configures 
> policy. Scalable policy engine framework allows policies to be executed on 
> different physical nodes in parallel. It is also used to define your own 
> policy partitioner class. Policy engine framework together with streaming 
> partitioning capability provided by all streaming platforms will make sure 
> policies and events can be evaluated in a fully distributed way. Extensible 
> policy engine framework allows developer to plugin a new policy engine with a 
> few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
> supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user activity 
> patterns or user profiles for Hadoop users based on the user behaviour in the 
> platform. These user profiles are modeled using Machine Learning algorithms 
> and used for detec

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Luke Han
+1 (non-binding)


Best Regards!
-

Luke Han

On Fri, Oct 23, 2015 at 11:26 PM, P. Taylor Goetz  wrote:

> +1 (binding)
>
> -Taylor
>
> > On Oct 23, 2015, at 10:11 AM, Manoharan, Arun 
> wrote:
> >
> > Hello Everyone,
> >
> > Thanks for all the feedback on the Eagle Proposal.
> >
> > I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
> >
> > The vote is open for 72 hours:
> >
> > [ ] +1 accept Eagle in the Incubator
> > [ ] ±0
> > [ ] -1 (please give reason)
> >
> > Eagle is a Monitoring solution for Hadoop to instantly identify access
> to sensitive data, recognize attacks, malicious activities and take actions
> in real time. Eagle supports a wide variety of policies on HDFS data and
> Hive. Eagle also provides machine learning models for detecting anomalous
> user behavior in Hadoop.
> >
> > The proposal is available on the wiki here:
> > https://wiki.apache.org/incubator/EagleProposal
> >
> > The text of the proposal is also available at the end of this email.
> >
> > Thanks for your time and help.
> >
> > Thanks,
> > Arun
> >
> > 
> >
> > Eagle
> >
> > Abstract
> > Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
> >
> > Proposal
> > Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
> >
> > Overview of Eagle
> > Eagle has 3 main parts.
> > 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> > 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> > 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
> >
> > Data Collection and Storage:
> > Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
> >
> > Data Processing and Policy Engine:
> > Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> > Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> > Machine Learning module: Eagle provides capabilities to define user
> activity patterns o

RE: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread wp chun
+1
wp_c...@hotmail.com
> 
> On 10/23/15, 11:26 PM, "P. Taylor Goetz"  wrote:
> 
> >+1 (binding)
> >
> >-Taylor
> >
> >> On Oct 23, 2015, at 10:11 AM, Manoharan, Arun 
> >>wrote:
> >> 
> >> Hello Everyone,
> >> 
> >> Thanks for all the feedback on the Eagle Proposal.
> >> 
> >> I would like to call for a [VOTE] on Eagle joining the ASF as an
> >>incubation project.
> >> 
> >> The vote is open for 72 hours:
> >> 
> >> [ ] +1 accept Eagle in the Incubator
> >> [ ] ±0
> >> [ ] -1 (please give reason)
> >> 
> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> >>to sensitive data, recognize attacks, malicious activities and take
> >>actions in real time. Eagle supports a wide variety of policies on HDFS
> >>data and Hive. Eagle also provides machine learning models for detecting
> >>anomalous user behavior in Hadoop.
> >> 
> >> The proposal is available on the wiki here:
> >> https://wiki.apache.org/incubator/EagleProposal
> >> 
> >> The text of the proposal is also available at the end of this email.
> >> 
> >> Thanks for your time and help.
> >> 
> >> Thanks,
> >> Arun
> >> 
> >> 
> >> 
> >> Eagle
> >> 
> >> Abstract
> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >>identify access to sensitive data, recognize attacks, malicious
> >>activities in hadoop and take actions.
> >> 
> >> Proposal
> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> >>enforces policies defined on sensitive data access and alerts or blocks
> >>user¹s access to that sensitive data in real time. Eagle also creates
> >>user profiles based on the typical access behaviour for HDFS and Hive
> >>and sends alerts when anomalous behaviour is detected. Eagle can also
> >>import sensitive data information classified by external classification
> >>engines to help define its policies.
> >> 
> >> Overview of Eagle
> >> Eagle has 3 main parts.
> >> 1.Data collection and storage - Eagle collects data from various hadoop
> >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> >>storage.
> >> 2.Data processing and policy engine - Eagle allows users to create
> >>policies based on various metadata properties on HDFS, Hive and HBase
> >>data.
> >> 3.Eagle services - Eagle services include policy manager, query service
> >>and the visualization component. Eagle provides intuitive user interface
> >>to administer Eagle and an alert dashboard to respond to real time
> >>alerts.
> >> 
> >> Data Collection and Storage:
> >> Eagle provides programming API for extending Eagle to integrate any
> >>data source into Eagle policy evaluation framework. For example, Eagle
> >>hdfs audit monitoring collects data from Kafka which is populated from
> >>namenode log4j appender or from logstash agent. Eagle hive monitoring
> >>collects hive query logs from running job through YARN API, which is
> >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
> >>for storing metadata and metrics data, and also supports relational
> >>database through configuration change.
> >> 
> >> Data Processing and Policy Engine:
> >> Processing Engine: Eagle provides stream processing API which is an
> >>abstraction of Apache Storm. It can also be extended to other streaming
> >>engines. This abstraction allows developers to assemble data
> >>transformation, filtering, external data join etc. without physically
> >>bound to a specific streaming platform. Eagle streaming API allows
> >>developers to easily integrate business logic with Eagle policy engine
> >>and internally Eagle framework compiles business logic execution DAG
> >>into program primitives of underlying stream infrastructure e.g. Apache
> >>Storm. For example, Eagle HDFS monitoring transforms audit log from
> >>Namenode to object and joins sensitivity metadata, security zone
> >>metadata which are generated from external programs or configured by
> >>user. Eagle hive monitoring filters running jobs to get hive query
> >>string and parses query string into object and then joins sensitivity
> >>metadata.
> >> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> >>scalable policy engine framework, extensible policy engine framework.
> >>Stream metadata API allows developers to declare event schema including
> >>what attributes constitute an event, what is the type for each
> >>attribute, and how to dynamically resolve attribute value in runtime
> >>when user configures policy. Scalable policy engine framework allows
> >>policies to be executed on different physical nodes in parallel. It is
> >>also used to define your own policy partitioner class. Policy engine
> >>framework together with streaming partitioning capability provided by
> >>all streaming platforms will make sure policies and events can be
> >>evaluated in a fully distributed way. Extensible policy engine framework
> >>allows developer to plugin a new policy engine with a few lines of
> >>codes. WSO2 Siddhi CEP engine is the policy engin

Re: [VOTE] Release Apache Kylin 1.1-incubating (rc1)

2015-10-23 Thread Luke Han
Thanks Justin, we will also double check and update any other stuff in
further releases.

Thanks again.
Luke


Best Regards!
-

Luke Han

On Fri, Oct 23, 2015 at 1:38 PM, Justin Mclean  wrote:

> Hi,
>
> > Would you mind to change your vote to release this candidate?
>
> You can release it without me changing my vote, there are no veto on
> releases.
>
> But as you asked and it's been dealt with my vote is +1 (binding).
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Owen O'Malley
+1 (binding)

On Fri, Oct 23, 2015 at 8:42 AM, wp chun  wrote:

> +1
> wp_c...@hotmail.com
> >
> > On 10/23/15, 11:26 PM, "P. Taylor Goetz"  wrote:
> >
> > >+1 (binding)
> > >
> > >-Taylor
> > >
> > >> On Oct 23, 2015, at 10:11 AM, Manoharan, Arun 
> > >>wrote:
> > >>
> > >> Hello Everyone,
> > >>
> > >> Thanks for all the feedback on the Eagle Proposal.
> > >>
> > >> I would like to call for a [VOTE] on Eagle joining the ASF as an
> > >>incubation project.
> > >>
> > >> The vote is open for 72 hours:
> > >>
> > >> [ ] +1 accept Eagle in the Incubator
> > >> [ ] ±0
> > >> [ ] -1 (please give reason)
> > >>
> > >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> > >>to sensitive data, recognize attacks, malicious activities and take
> > >>actions in real time. Eagle supports a wide variety of policies on HDFS
> > >>data and Hive. Eagle also provides machine learning models for
> detecting
> > >>anomalous user behavior in Hadoop.
> > >>
> > >> The proposal is available on the wiki here:
> > >> https://wiki.apache.org/incubator/EagleProposal
> > >>
> > >> The text of the proposal is also available at the end of this email.
> > >>
> > >> Thanks for your time and help.
> > >>
> > >> Thanks,
> > >> Arun
> > >>
> > >> 
> > >>
> > >> Eagle
> > >>
> > >> Abstract
> > >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> > >>identify access to sensitive data, recognize attacks, malicious
> > >>activities in hadoop and take actions.
> > >>
> > >> Proposal
> > >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> > >>enforces policies defined on sensitive data access and alerts or blocks
> > >>user¹s access to that sensitive data in real time. Eagle also creates
> > >>user profiles based on the typical access behaviour for HDFS and Hive
> > >>and sends alerts when anomalous behaviour is detected. Eagle can also
> > >>import sensitive data information classified by external classification
> > >>engines to help define its policies.
> > >>
> > >> Overview of Eagle
> > >> Eagle has 3 main parts.
> > >> 1.Data collection and storage - Eagle collects data from various
> hadoop
> > >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> > >>storage.
> > >> 2.Data processing and policy engine - Eagle allows users to create
> > >>policies based on various metadata properties on HDFS, Hive and HBase
> > >>data.
> > >> 3.Eagle services - Eagle services include policy manager, query
> service
> > >>and the visualization component. Eagle provides intuitive user
> interface
> > >>to administer Eagle and an alert dashboard to respond to real time
> > >>alerts.
> > >>
> > >> Data Collection and Storage:
> > >> Eagle provides programming API for extending Eagle to integrate any
> > >>data source into Eagle policy evaluation framework. For example, Eagle
> > >>hdfs audit monitoring collects data from Kafka which is populated from
> > >>namenode log4j appender or from logstash agent. Eagle hive monitoring
> > >>collects hive query logs from running job through YARN API, which is
> > >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
> > >>for storing metadata and metrics data, and also supports relational
> > >>database through configuration change.
> > >>
> > >> Data Processing and Policy Engine:
> > >> Processing Engine: Eagle provides stream processing API which is an
> > >>abstraction of Apache Storm. It can also be extended to other streaming
> > >>engines. This abstraction allows developers to assemble data
> > >>transformation, filtering, external data join etc. without physically
> > >>bound to a specific streaming platform. Eagle streaming API allows
> > >>developers to easily integrate business logic with Eagle policy engine
> > >>and internally Eagle framework compiles business logic execution DAG
> > >>into program primitives of underlying stream infrastructure e.g. Apache
> > >>Storm. For example, Eagle HDFS monitoring transforms audit log from
> > >>Namenode to object and joins sensitivity metadata, security zone
> > >>metadata which are generated from external programs or configured by
> > >>user. Eagle hive monitoring filters running jobs to get hive query
> > >>string and parses query string into object and then joins sensitivity
> > >>metadata.
> > >> Alerting Framework: Eagle Alert Framework includes stream metadata
> API,
> > >>scalable policy engine framework, extensible policy engine framework.
> > >>Stream metadata API allows developers to declare event schema including
> > >>what attributes constitute an event, what is the type for each
> > >>attribute, and how to dynamically resolve attribute value in runtime
> > >>when user configures policy. Scalable policy engine framework allows
> > >>policies to be executed on different physical nodes in parallel. It is
> > >>also used to define your own policy partitioner class. Policy engine
> > >>framework together with streaming partitioning capability provided by
> > 

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Libin Sun
+1 (non-binding)

2015-10-23 23:50 GMT+08:00 Owen O'Malley :

> +1 (binding)
>
> On Fri, Oct 23, 2015 at 8:42 AM, wp chun  wrote:
>
> > +1
> > wp_c...@hotmail.com
> > >
> > > On 10/23/15, 11:26 PM, "P. Taylor Goetz"  wrote:
> > >
> > > >+1 (binding)
> > > >
> > > >-Taylor
> > > >
> > > >> On Oct 23, 2015, at 10:11 AM, Manoharan, Arun  >
> > > >>wrote:
> > > >>
> > > >> Hello Everyone,
> > > >>
> > > >> Thanks for all the feedback on the Eagle Proposal.
> > > >>
> > > >> I would like to call for a [VOTE] on Eagle joining the ASF as an
> > > >>incubation project.
> > > >>
> > > >> The vote is open for 72 hours:
> > > >>
> > > >> [ ] +1 accept Eagle in the Incubator
> > > >> [ ] ±0
> > > >> [ ] -1 (please give reason)
> > > >>
> > > >> Eagle is a Monitoring solution for Hadoop to instantly identify
> access
> > > >>to sensitive data, recognize attacks, malicious activities and take
> > > >>actions in real time. Eagle supports a wide variety of policies on
> HDFS
> > > >>data and Hive. Eagle also provides machine learning models for
> > detecting
> > > >>anomalous user behavior in Hadoop.
> > > >>
> > > >> The proposal is available on the wiki here:
> > > >> https://wiki.apache.org/incubator/EagleProposal
> > > >>
> > > >> The text of the proposal is also available at the end of this email.
> > > >>
> > > >> Thanks for your time and help.
> > > >>
> > > >> Thanks,
> > > >> Arun
> > > >>
> > > >> 
> > > >>
> > > >> Eagle
> > > >>
> > > >> Abstract
> > > >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> > > >>identify access to sensitive data, recognize attacks, malicious
> > > >>activities in hadoop and take actions.
> > > >>
> > > >> Proposal
> > > >> Eagle audits access to HDFS files, Hive and HBase tables in real
> time,
> > > >>enforces policies defined on sensitive data access and alerts or
> blocks
> > > >>user¹s access to that sensitive data in real time. Eagle also creates
> > > >>user profiles based on the typical access behaviour for HDFS and Hive
> > > >>and sends alerts when anomalous behaviour is detected. Eagle can also
> > > >>import sensitive data information classified by external
> classification
> > > >>engines to help define its policies.
> > > >>
> > > >> Overview of Eagle
> > > >> Eagle has 3 main parts.
> > > >> 1.Data collection and storage - Eagle collects data from various
> > hadoop
> > > >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> > > >>storage.
> > > >> 2.Data processing and policy engine - Eagle allows users to create
> > > >>policies based on various metadata properties on HDFS, Hive and HBase
> > > >>data.
> > > >> 3.Eagle services - Eagle services include policy manager, query
> > service
> > > >>and the visualization component. Eagle provides intuitive user
> > interface
> > > >>to administer Eagle and an alert dashboard to respond to real time
> > > >>alerts.
> > > >>
> > > >> Data Collection and Storage:
> > > >> Eagle provides programming API for extending Eagle to integrate any
> > > >>data source into Eagle policy evaluation framework. For example,
> Eagle
> > > >>hdfs audit monitoring collects data from Kafka which is populated
> from
> > > >>namenode log4j appender or from logstash agent. Eagle hive monitoring
> > > >>collects hive query logs from running job through YARN API, which is
> > > >>designed to be scalable and fault-tolerant. Eagle uses HBase as
> storage
> > > >>for storing metadata and metrics data, and also supports relational
> > > >>database through configuration change.
> > > >>
> > > >> Data Processing and Policy Engine:
> > > >> Processing Engine: Eagle provides stream processing API which is an
> > > >>abstraction of Apache Storm. It can also be extended to other
> streaming
> > > >>engines. This abstraction allows developers to assemble data
> > > >>transformation, filtering, external data join etc. without physically
> > > >>bound to a specific streaming platform. Eagle streaming API allows
> > > >>developers to easily integrate business logic with Eagle policy
> engine
> > > >>and internally Eagle framework compiles business logic execution DAG
> > > >>into program primitives of underlying stream infrastructure e.g.
> Apache
> > > >>Storm. For example, Eagle HDFS monitoring transforms audit log from
> > > >>Namenode to object and joins sensitivity metadata, security zone
> > > >>metadata which are generated from external programs or configured by
> > > >>user. Eagle hive monitoring filters running jobs to get hive query
> > > >>string and parses query string into object and then joins sensitivity
> > > >>metadata.
> > > >> Alerting Framework: Eagle Alert Framework includes stream metadata
> > API,
> > > >>scalable policy engine framework, extensible policy engine framework.
> > > >>Stream metadata API allows developers to declare event schema
> including
> > > >>what attributes constitute an event, what is the type for each
> > > >>attribute, and how to dynamically resolve attribute value in runtime

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Ted Dunning
+1 (binding)


On Fri, Oct 23, 2015 at 7:11 AM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection of anomalous users activities.
> Eagle uses Eigen Value Decompositio

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Hao Chen
+1 (non-binding)

On Fri, Oct 23, 2015 at 10:11 PM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection of anomalous users activities.
> Eagle uses Eigen Value Decompos

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Hitesh Shah
+1 (binding)

— Hitesh

On Oct 23, 2015, at 7:11 AM, Manoharan, Arun  wrote:

> Hello Everyone,
> 
> Thanks for all the feedback on the Eagle Proposal.
> 
> I would like to call for a [VOTE] on Eagle joining the ASF as an incubation 
> project.
> 
> The vote is open for 72 hours:
> 
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
> 
> Eagle is a Monitoring solution for Hadoop to instantly identify access to 
> sensitive data, recognize attacks, malicious activities and take actions in 
> real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
> Eagle also provides machine learning models for detecting anomalous user 
> behavior in Hadoop.
> 
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
> 
> The text of the proposal is also available at the end of this email.
> 
> Thanks for your time and help.
> 
> Thanks,
> Arun
> 
> 
> 
> Eagle
> 
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
> access to sensitive data, recognize attacks, malicious activities in hadoop 
> and take actions.
> 
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time, 
> enforces policies defined on sensitive data access and alerts or blocks 
> user’s access to that sensitive data in real time. Eagle also creates user 
> profiles based on the typical access behaviour for HDFS and Hive and sends 
> alerts when anomalous behaviour is detected. Eagle can also import sensitive 
> data information classified by external classification engines to help define 
> its policies.
> 
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop logs 
> in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create policies 
> based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service and 
> the visualization component. Eagle provides intuitive user interface to 
> administer Eagle and an alert dashboard to respond to real time alerts.
> 
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data 
> source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
> monitoring collects data from Kafka which is populated from namenode log4j 
> appender or from logstash agent. Eagle hive monitoring collects hive query 
> logs from running job through YARN API, which is designed to be scalable and 
> fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
> data, and also supports relational database through configuration change.
> 
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an 
> abstraction of Apache Storm. It can also be extended to other streaming 
> engines. This abstraction allows developers to assemble data transformation, 
> filtering, external data join etc. without physically bound to a specific 
> streaming platform. Eagle streaming API allows developers to easily integrate 
> business logic with Eagle policy engine and internally Eagle framework 
> compiles business logic execution DAG into program primitives of underlying 
> stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
> transforms audit log from Namenode to object and joins sensitivity metadata, 
> security zone metadata which are generated from external programs or 
> configured by user. Eagle hive monitoring filters running jobs to get hive 
> query string and parses query string into object and then joins sensitivity 
> metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API, 
> scalable policy engine framework, extensible policy engine framework. Stream 
> metadata API allows developers to declare event schema including what 
> attributes constitute an event, what is the type for each attribute, and how 
> to dynamically resolve attribute value in runtime when user configures 
> policy. Scalable policy engine framework allows policies to be executed on 
> different physical nodes in parallel. It is also used to define your own 
> policy partitioner class. Policy engine framework together with streaming 
> partitioning capability provided by all streaming platforms will make sure 
> policies and events can be evaluated in a fully distributed way. Extensible 
> policy engine framework allows developer to plugin a new policy engine with a 
> few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
> supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user activity 
> patterns or user profiles for Hadoop users based on the user behaviour in the 
> platform. These user profiles are modeled using Machine Learning algorithms 
> and used for detection

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Adunuthula, Seshu
+1 (non binding)

On 10/23/15, 9:52 AM, "Hitesh Shah"  wrote:

>+1 (binding)
>
>‹ Hitesh
>
>On Oct 23, 2015, at 7:11 AM, Manoharan, Arun  wrote:
>
>> Hello Everyone,
>> 
>> Thanks for all the feedback on the Eagle Proposal.
>> 
>> I would like to call for a [VOTE] on Eagle joining the ASF as an
>>incubation project.
>> 
>> The vote is open for 72 hours:
>> 
>> [ ] +1 accept Eagle in the Incubator
>> [ ] ±0
>> [ ] -1 (please give reason)
>> 
>> Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to sensitive data, recognize attacks, malicious activities and take
>>actions in real time. Eagle supports a wide variety of policies on HDFS
>>data and Hive. Eagle also provides machine learning models for detecting
>>anomalous user behavior in Hadoop.
>> 
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>> 
>> The text of the proposal is also available at the end of this email.
>> 
>> Thanks for your time and help.
>> 
>> Thanks,
>> Arun
>> 
>> 
>> 
>> Eagle
>> 
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>identify access to sensitive data, recognize attacks, malicious
>>activities in hadoop and take actions.
>> 
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>enforces policies defined on sensitive data access and alerts or blocks
>>user¹s access to that sensitive data in real time. Eagle also creates
>>user profiles based on the typical access behaviour for HDFS and Hive
>>and sends alerts when anomalous behaviour is detected. Eagle can also
>>import sensitive data information classified by external classification
>>engines to help define its policies.
>> 
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>>policies based on various metadata properties on HDFS, Hive and HBase
>>data.
>> 3.Eagle services - Eagle services include policy manager, query service
>>and the visualization component. Eagle provides intuitive user interface
>>to administer Eagle and an alert dashboard to respond to real time
>>alerts.
>> 
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any
>>data source into Eagle policy evaluation framework. For example, Eagle
>>hdfs audit monitoring collects data from Kafka which is populated from
>>namenode log4j appender or from logstash agent. Eagle hive monitoring
>>collects hive query logs from running job through YARN API, which is
>>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>>for storing metadata and metrics data, and also supports relational
>>database through configuration change.
>> 
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>>abstraction of Apache Storm. It can also be extended to other streaming
>>engines. This abstraction allows developers to assemble data
>>transformation, filtering, external data join etc. without physically
>>bound to a specific streaming platform. Eagle streaming API allows
>>developers to easily integrate business logic with Eagle policy engine
>>and internally Eagle framework compiles business logic execution DAG
>>into program primitives of underlying stream infrastructure e.g. Apache
>>Storm. For example, Eagle HDFS monitoring transforms audit log from
>>Namenode to object and joins sensitivity metadata, security zone
>>metadata which are generated from external programs or configured by
>>user. Eagle hive monitoring filters running jobs to get hive query
>>string and parses query string into object and then joins sensitivity
>>metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>>scalable policy engine framework, extensible policy engine framework.
>>Stream metadata API allows developers to declare event schema including
>>what attributes constitute an event, what is the type for each
>>attribute, and how to dynamically resolve attribute value in runtime
>>when user configures policy. Scalable policy engine framework allows
>>policies to be executed on different physical nodes in parallel. It is
>>also used to define your own policy partitioner class. Policy engine
>>framework together with streaming partitioning capability provided by
>>all streaming platforms will make sure policies and events can be
>>evaluated in a fully distributed way. Extensible policy engine framework
>>allows developer to plugin a new policy engine with a few lines of
>>codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports
>>as first-class citizen.
>> Machine Learning module: Eagle provides capabilities to define user
>>activity patterns or user profiles for Hadoop users based on the user
>>behaviour in the platform. These user p

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread John D. Ament
+1
On Oct 23, 2015 10:11, "Manoharan, Arun"  wrote:

> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection of anomalous users activities.
> Eagle uses Eigen Value Decomposition, and Density Estim

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Julian Hyde
+1 (binding)

> On Oct 23, 2015, at 10:13 AM, John D. Ament  wrote:
> 
> +1
> On Oct 23, 2015 10:11, "Manoharan, Arun"  wrote:
> 
>> Hello Everyone,
>> 
>> Thanks for all the feedback on the Eagle Proposal.
>> 
>> I would like to call for a [VOTE] on Eagle joining the ASF as an
>> incubation project.
>> 
>> The vote is open for 72 hours:
>> 
>> [ ] +1 accept Eagle in the Incubator
>> [ ] ±0
>> [ ] -1 (please give reason)
>> 
>> Eagle is a Monitoring solution for Hadoop to instantly identify access to
>> sensitive data, recognize attacks, malicious activities and take actions in
>> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
>> Eagle also provides machine learning models for detecting anomalous user
>> behavior in Hadoop.
>> 
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>> 
>> The text of the proposal is also available at the end of this email.
>> 
>> Thanks for your time and help.
>> 
>> Thanks,
>> Arun
>> 
>> 
>> 
>> Eagle
>> 
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> identify access to sensitive data, recognize attacks, malicious activities
>> in hadoop and take actions.
>> 
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> enforces policies defined on sensitive data access and alerts or blocks
>> user’s access to that sensitive data in real time. Eagle also creates user
>> profiles based on the typical access behaviour for HDFS and Hive and sends
>> alerts when anomalous behaviour is detected. Eagle can also import
>> sensitive data information classified by external classification engines to
>> help define its policies.
>> 
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>> policies based on various metadata properties on HDFS, Hive and HBase data.
>> 3.Eagle services - Eagle services include policy manager, query service
>> and the visualization component. Eagle provides intuitive user interface to
>> administer Eagle and an alert dashboard to respond to real time alerts.
>> 
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any data
>> source into Eagle policy evaluation framework. For example, Eagle hdfs
>> audit monitoring collects data from Kafka which is populated from namenode
>> log4j appender or from logstash agent. Eagle hive monitoring collects hive
>> query logs from running job through YARN API, which is designed to be
>> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> metadata and metrics data, and also supports relational database through
>> configuration change.
>> 
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>> abstraction of Apache Storm. It can also be extended to other streaming
>> engines. This abstraction allows developers to assemble data
>> transformation, filtering, external data join etc. without physically bound
>> to a specific streaming platform. Eagle streaming API allows developers to
>> easily integrate business logic with Eagle policy engine and internally
>> Eagle framework compiles business logic execution DAG into program
>> primitives of underlying stream infrastructure e.g. Apache Storm. For
>> example, Eagle HDFS monitoring transforms audit log from Namenode to object
>> and joins sensitivity metadata, security zone metadata which are generated
>> from external programs or configured by user. Eagle hive monitoring filters
>> running jobs to get hive query string and parses query string into object
>> and then joins sensitivity metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> scalable policy engine framework, extensible policy engine framework.
>> Stream metadata API allows developers to declare event schema including
>> what attributes constitute an event, what is the type for each attribute,
>> and how to dynamically resolve attribute value in runtime when user
>> configures policy. Scalable policy engine framework allows policies to be
>> executed on different physical nodes in parallel. It is also used to define
>> your own policy partitioner class. Policy engine framework together with
>> streaming partitioning capability provided by all streaming platforms will
>> make sure policies and events can be evaluated in a fully distributed way.
>> Extensible policy engine framework allows developer to plugin a new policy
>> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
>> engine which Eagle supports as first-class citizen.
>> Machine Learning module: Eagle provides capabilities to define user
>> activity patterns or user profiles for Hadoop users based on the user
>> behaviour in the

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Chris Nauroth
+1 (binding)

--Chris Nauroth




On 10/23/15, 7:11 AM, "Manoharan, Arun"  wrote:

>Hello Everyone,
>
>Thanks for all the feedback on the Eagle Proposal.
>
>I would like to call for a [VOTE] on Eagle joining the ASF as an
>incubation project.
>
>The vote is open for 72 hours:
>
>[ ] +1 accept Eagle in the Incubator
>[ ] ±0
>[ ] -1 (please give reason)
>
>Eagle is a Monitoring solution for Hadoop to instantly identify access to
>sensitive data, recognize attacks, malicious activities and take actions
>in real time. Eagle supports a wide variety of policies on HDFS data and
>Hive. Eagle also provides machine learning models for detecting anomalous
>user behavior in Hadoop.
>
>The proposal is available on the wiki here:
>https://wiki.apache.org/incubator/EagleProposal
>
>The text of the proposal is also available at the end of this email.
>
>Thanks for your time and help.
>
>Thanks,
>Arun
>
>
>
>Eagle
>
>Abstract
>Eagle is an Open Source Monitoring solution for Hadoop to instantly
>identify access to sensitive data, recognize attacks, malicious
>activities in hadoop and take actions.
>
>Proposal
>Eagle audits access to HDFS files, Hive and HBase tables in real time,
>enforces policies defined on sensitive data access and alerts or blocks
>user¹s access to that sensitive data in real time. Eagle also creates
>user profiles based on the typical access behaviour for HDFS and Hive and
>sends alerts when anomalous behaviour is detected. Eagle can also import
>sensitive data information classified by external classification engines
>to help define its policies.
>
>Overview of Eagle
>Eagle has 3 main parts.
>1.Data collection and storage - Eagle collects data from various hadoop
>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>storage.
>2.Data processing and policy engine - Eagle allows users to create
>policies based on various metadata properties on HDFS, Hive and HBase
>data.
>3.Eagle services - Eagle services include policy manager, query service
>and the visualization component. Eagle provides intuitive user interface
>to administer Eagle and an alert dashboard to respond to real time alerts.
>
>Data Collection and Storage:
>Eagle provides programming API for extending Eagle to integrate any data
>source into Eagle policy evaluation framework. For example, Eagle hdfs
>audit monitoring collects data from Kafka which is populated from
>namenode log4j appender or from logstash agent. Eagle hive monitoring
>collects hive query logs from running job through YARN API, which is
>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>for storing metadata and metrics data, and also supports relational
>database through configuration change.
>
>Data Processing and Policy Engine:
>Processing Engine: Eagle provides stream processing API which is an
>abstraction of Apache Storm. It can also be extended to other streaming
>engines. This abstraction allows developers to assemble data
>transformation, filtering, external data join etc. without physically
>bound to a specific streaming platform. Eagle streaming API allows
>developers to easily integrate business logic with Eagle policy engine
>and internally Eagle framework compiles business logic execution DAG into
>program primitives of underlying stream infrastructure e.g. Apache Storm.
>For example, Eagle HDFS monitoring transforms audit log from Namenode to
>object and joins sensitivity metadata, security zone metadata which are
>generated from external programs or configured by user. Eagle hive
>monitoring filters running jobs to get hive query string and parses query
>string into object and then joins sensitivity metadata.
>Alerting Framework: Eagle Alert Framework includes stream metadata API,
>scalable policy engine framework, extensible policy engine framework.
>Stream metadata API allows developers to declare event schema including
>what attributes constitute an event, what is the type for each attribute,
>and how to dynamically resolve attribute value in runtime when user
>configures policy. Scalable policy engine framework allows policies to be
>executed on different physical nodes in parallel. It is also used to
>define your own policy partitioner class. Policy engine framework
>together with streaming partitioning capability provided by all streaming
>platforms will make sure policies and events can be evaluated in a fully
>distributed way. Extensible policy engine framework allows developer to
>plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP
>engine is the policy engine which Eagle supports as first-class citizen.
>Machine Learning module: Eagle provides capabilities to define user
>activity patterns or user profiles for Hadoop users based on the user
>behaviour in the platform. These user profiles are modeled using Machine
>Learning algorithms and used for detection of anomalous users activities.
>Eagle uses Eigen Value Decomposition, and Density Estimation algorithms
>for generating user profile model

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Balaji Ganesan
+1

On Fri, Oct 23, 2015 at 12:26 PM, Chris Nauroth 
wrote:

> +1 (binding)
>
> --Chris Nauroth
>
>
>
>
> On 10/23/15, 7:11 AM, "Manoharan, Arun"  wrote:
>
> >Hello Everyone,
> >
> >Thanks for all the feedback on the Eagle Proposal.
> >
> >I would like to call for a [VOTE] on Eagle joining the ASF as an
> >incubation project.
> >
> >The vote is open for 72 hours:
> >
> >[ ] +1 accept Eagle in the Incubator
> >[ ] ±0
> >[ ] -1 (please give reason)
> >
> >Eagle is a Monitoring solution for Hadoop to instantly identify access to
> >sensitive data, recognize attacks, malicious activities and take actions
> >in real time. Eagle supports a wide variety of policies on HDFS data and
> >Hive. Eagle also provides machine learning models for detecting anomalous
> >user behavior in Hadoop.
> >
> >The proposal is available on the wiki here:
> >https://wiki.apache.org/incubator/EagleProposal
> >
> >The text of the proposal is also available at the end of this email.
> >
> >Thanks for your time and help.
> >
> >Thanks,
> >Arun
> >
> >
> >
> >Eagle
> >
> >Abstract
> >Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >identify access to sensitive data, recognize attacks, malicious
> >activities in hadoop and take actions.
> >
> >Proposal
> >Eagle audits access to HDFS files, Hive and HBase tables in real time,
> >enforces policies defined on sensitive data access and alerts or blocks
> >user¹s access to that sensitive data in real time. Eagle also creates
> >user profiles based on the typical access behaviour for HDFS and Hive and
> >sends alerts when anomalous behaviour is detected. Eagle can also import
> >sensitive data information classified by external classification engines
> >to help define its policies.
> >
> >Overview of Eagle
> >Eagle has 3 main parts.
> >1.Data collection and storage - Eagle collects data from various hadoop
> >logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> >storage.
> >2.Data processing and policy engine - Eagle allows users to create
> >policies based on various metadata properties on HDFS, Hive and HBase
> >data.
> >3.Eagle services - Eagle services include policy manager, query service
> >and the visualization component. Eagle provides intuitive user interface
> >to administer Eagle and an alert dashboard to respond to real time alerts.
> >
> >Data Collection and Storage:
> >Eagle provides programming API for extending Eagle to integrate any data
> >source into Eagle policy evaluation framework. For example, Eagle hdfs
> >audit monitoring collects data from Kafka which is populated from
> >namenode log4j appender or from logstash agent. Eagle hive monitoring
> >collects hive query logs from running job through YARN API, which is
> >designed to be scalable and fault-tolerant. Eagle uses HBase as storage
> >for storing metadata and metrics data, and also supports relational
> >database through configuration change.
> >
> >Data Processing and Policy Engine:
> >Processing Engine: Eagle provides stream processing API which is an
> >abstraction of Apache Storm. It can also be extended to other streaming
> >engines. This abstraction allows developers to assemble data
> >transformation, filtering, external data join etc. without physically
> >bound to a specific streaming platform. Eagle streaming API allows
> >developers to easily integrate business logic with Eagle policy engine
> >and internally Eagle framework compiles business logic execution DAG into
> >program primitives of underlying stream infrastructure e.g. Apache Storm.
> >For example, Eagle HDFS monitoring transforms audit log from Namenode to
> >object and joins sensitivity metadata, security zone metadata which are
> >generated from external programs or configured by user. Eagle hive
> >monitoring filters running jobs to get hive query string and parses query
> >string into object and then joins sensitivity metadata.
> >Alerting Framework: Eagle Alert Framework includes stream metadata API,
> >scalable policy engine framework, extensible policy engine framework.
> >Stream metadata API allows developers to declare event schema including
> >what attributes constitute an event, what is the type for each attribute,
> >and how to dynamically resolve attribute value in runtime when user
> >configures policy. Scalable policy engine framework allows policies to be
> >executed on different physical nodes in parallel. It is also used to
> >define your own policy partitioner class. Policy engine framework
> >together with streaming partitioning capability provided by all streaming
> >platforms will make sure policies and events can be evaluated in a fully
> >distributed way. Extensible policy engine framework allows developer to
> >plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP
> >engine is the policy engine which Eagle supports as first-class citizen.
> >Machine Learning module: Eagle provides capabilities to define user
> >activity patterns or user profiles for Hadoop users based

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Samant, Medha
+1
-Medha

On 10/23/15, 1:14 PM, "Balaji Ganesan"  wrote:

>+1
>
>On Fri, Oct 23, 2015 at 12:26 PM, Chris Nauroth 
>wrote:
>
>> +1 (binding)
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 10/23/15, 7:11 AM, "Manoharan, Arun"  wrote:
>>
>> >Hello Everyone,
>> >
>> >Thanks for all the feedback on the Eagle Proposal.
>> >
>> >I would like to call for a [VOTE] on Eagle joining the ASF as an
>> >incubation project.
>> >
>> >The vote is open for 72 hours:
>> >
>> >[ ] +1 accept Eagle in the Incubator
>> >[ ] ±0
>> >[ ] -1 (please give reason)
>> >
>> >Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to
>> >sensitive data, recognize attacks, malicious activities and take
>>actions
>> >in real time. Eagle supports a wide variety of policies on HDFS data
>>and
>> >Hive. Eagle also provides machine learning models for detecting
>>anomalous
>> >user behavior in Hadoop.
>> >
>> >The proposal is available on the wiki here:
>> >https://wiki.apache.org/incubator/EagleProposal
>> >
>> >The text of the proposal is also available at the end of this email.
>> >
>> >Thanks for your time and help.
>> >
>> >Thanks,
>> >Arun
>> >
>> >
>> >
>> >Eagle
>> >
>> >Abstract
>> >Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> >identify access to sensitive data, recognize attacks, malicious
>> >activities in hadoop and take actions.
>> >
>> >Proposal
>> >Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> >enforces policies defined on sensitive data access and alerts or blocks
>> >user¹s access to that sensitive data in real time. Eagle also creates
>> >user profiles based on the typical access behaviour for HDFS and Hive
>>and
>> >sends alerts when anomalous behaviour is detected. Eagle can also
>>import
>> >sensitive data information classified by external classification
>>engines
>> >to help define its policies.
>> >
>> >Overview of Eagle
>> >Eagle has 3 main parts.
>> >1.Data collection and storage - Eagle collects data from various hadoop
>> >logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>> >storage.
>> >2.Data processing and policy engine - Eagle allows users to create
>> >policies based on various metadata properties on HDFS, Hive and HBase
>> >data.
>> >3.Eagle services - Eagle services include policy manager, query service
>> >and the visualization component. Eagle provides intuitive user
>>interface
>> >to administer Eagle and an alert dashboard to respond to real time
>>alerts.
>> >
>> >Data Collection and Storage:
>> >Eagle provides programming API for extending Eagle to integrate any
>>data
>> >source into Eagle policy evaluation framework. For example, Eagle hdfs
>> >audit monitoring collects data from Kafka which is populated from
>> >namenode log4j appender or from logstash agent. Eagle hive monitoring
>> >collects hive query logs from running job through YARN API, which is
>> >designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>> >for storing metadata and metrics data, and also supports relational
>> >database through configuration change.
>> >
>> >Data Processing and Policy Engine:
>> >Processing Engine: Eagle provides stream processing API which is an
>> >abstraction of Apache Storm. It can also be extended to other streaming
>> >engines. This abstraction allows developers to assemble data
>> >transformation, filtering, external data join etc. without physically
>> >bound to a specific streaming platform. Eagle streaming API allows
>> >developers to easily integrate business logic with Eagle policy engine
>> >and internally Eagle framework compiles business logic execution DAG
>>into
>> >program primitives of underlying stream infrastructure e.g. Apache
>>Storm.
>> >For example, Eagle HDFS monitoring transforms audit log from Namenode
>>to
>> >object and joins sensitivity metadata, security zone metadata which are
>> >generated from external programs or configured by user. Eagle hive
>> >monitoring filters running jobs to get hive query string and parses
>>query
>> >string into object and then joins sensitivity metadata.
>> >Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> >scalable policy engine framework, extensible policy engine framework.
>> >Stream metadata API allows developers to declare event schema including
>> >what attributes constitute an event, what is the type for each
>>attribute,
>> >and how to dynamically resolve attribute value in runtime when user
>> >configures policy. Scalable policy engine framework allows policies to
>>be
>> >executed on different physical nodes in parallel. It is also used to
>> >define your own policy partitioner class. Policy engine framework
>> >together with streaming partitioning capability provided by all
>>streaming
>> >platforms will make sure policies and events can be evaluated in a
>>fully
>> >distributed way. Extensible policy engine framework allows developer to
>> >plugin a new policy engine with a few lines of codes. WSO2 Siddhi CEP
>> 

[DISCUSS] SystemML Incubator Proposal

2015-10-23 Thread Luciano Resende
We would like to start a discussion on accepting SystemML as an Apache
Incubator project.

The proposal is available at :
https://wiki.apache.org/incubator/SystemM

And it's contents is also copied below.

Thanks in Advance for you time reviewing and providing feedback.

==

= SystemML =

== Abstract ==

SystemML provides declarative large-scale machine learning (ML) that aims
at flexible specification of ML algorithms and automatic generation of
hybrid runtime plans ranging from single node, in-memory computations, to
distributed computations on Apache Hadoop and  Apache Spark. ML algorithms
are expressed in an R-like syntax, that includes linear algebra primitives,
statistical functions, and ML-specific constructs. This high-level language
significantly increases the productivity of data scientists as it provides
(1) full flexibility in expressing custom analytics, and (2) data
independence from the underlying input formats and physical data
representations. Automatic optimization according to data characteristics
such as distribution on the disk file system, and sparsity as well as
processing characteristics in the distributed environment like number of
nodes, CPU, memory per node, ensures both efficiency and scalability.

== Proposal ==

The goal of SystemML is to create a commercial friendly, scalable and
extensible machine learning framework for data scientists to create or
extend machine learning algorithms using a declarative syntax. The machine
learning framework enables data scientists to develop algorithms locally
without the need of a distributed cluster, and scale up and scale out the
execution of these algorithms to distributed Hadoop or Spark clusters.

== Background ==

SystemML started as a research project in the IBM Almaden Research Center
around 2010 aiming to enable data scientists to develop machine learning
algorithms independent of data and cluster characteristics.

== Rationale ==

SystemML enables the specification of machine learning algorithms using a
declarative machine learning (DML) language. DML includes linear algebra
primitives, statistical functions, and additional constructs. This
high-level language significantly increases the productivity of data
scientists as it provides (1) full flexibility in expressing custom
analytics and (2) data independence from the underlying input formats and
physical data representations.

SystemML computations can be executed in a variety of different modes. It
supports single node in-memory computations and large-scale distributed
cluster computations. This allows the user to quickly prototype new
algorithms in local environments but automatically scale to large data
sizes as well without changing the algorithm implementation.

Algorithms specified in DML are dynamically compiled and optimized based on
data and cluster characteristics using rule-based and cost-based
optimization techniques. The optimizer automatically generates hybrid
runtime execution plans ranging from in-memory single-node execution to
distributed computations on Spark or Hadoop. This ensures both efficiency
and scalability. Automatic optimization reduces or eliminates the need to
hand-tune distributed runtime execution plans and system configurations.

== Initial Goals ==

The initial goals to move SystemML to the Apache Incubator is to broaden
the community foster the contributions from data scientists to develop new
machine learning algorithms and enhance the existing ones. Ultimately, this
may lead to the creation of an industry standard in specifying machine
learning algorithms.

== Current Status ==

The initial code has been developed at the IBM Almaden Research Center in
California and has recently been made available in GitHub under the Apache
Software License 2.0. The project currently supports a single node (in
memory computation) as well as distributed computations utilizing Hadoop or
Spark clusters.

=== Meritocracy ===

We plan to invest in supporting a meritocracy. We will discuss the
requirements in an open forum. Several companies have already expressed
interest in this project, and we intend to invite additional developers to
participate. We will encourage and monitor community participation so that
privileges can be extended to those that contribute operating to the
standard of meritocracy that Apache emphasizes.

=== Community ===

The need for a generic scalable and declarative machine learning approach
in the open source is tremendous, so there is a potential for a very large
community. We believe that SystemML’s extensible architecture, declarative
syntax, cost based optimizer and its alignment with Spark will further
encourage community participation not only in enhancing the infrastructure
but also speed up the creation of algorithms for a wide range of use
cases.  We expect that over time SystemML will attract a large community.

=== Alignment ===

The initial committers strongly believe that a generic scalable and
declarative machi

Re: [DISCUSS] SystemML Incubator Proposal

2015-10-23 Thread Henry Saputra
Hi Luciano,

Good proposal, but looks like
https://wiki.apache.org/incubator/SystemM does not exist?

Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I
don't they could be mentors of this project, yet.

They can ask to be member of IPMCs since both are already member of
ASF. But for now need to remove it from proposal.


- Henry

On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende  wrote:
> We would like to start a discussion on accepting SystemML as an Apache
> Incubator project.
>
> The proposal is available at :
> https://wiki.apache.org/incubator/SystemM
>
> And it's contents is also copied below.
>
> Thanks in Advance for you time reviewing and providing feedback.
>
> ==
>
> = SystemML =
>
> == Abstract ==
>
> SystemML provides declarative large-scale machine learning (ML) that aims
> at flexible specification of ML algorithms and automatic generation of
> hybrid runtime plans ranging from single node, in-memory computations, to
> distributed computations on Apache Hadoop and  Apache Spark. ML algorithms
> are expressed in an R-like syntax, that includes linear algebra primitives,
> statistical functions, and ML-specific constructs. This high-level language
> significantly increases the productivity of data scientists as it provides
> (1) full flexibility in expressing custom analytics, and (2) data
> independence from the underlying input formats and physical data
> representations. Automatic optimization according to data characteristics
> such as distribution on the disk file system, and sparsity as well as
> processing characteristics in the distributed environment like number of
> nodes, CPU, memory per node, ensures both efficiency and scalability.
>
> == Proposal ==
>
> The goal of SystemML is to create a commercial friendly, scalable and
> extensible machine learning framework for data scientists to create or
> extend machine learning algorithms using a declarative syntax. The machine
> learning framework enables data scientists to develop algorithms locally
> without the need of a distributed cluster, and scale up and scale out the
> execution of these algorithms to distributed Hadoop or Spark clusters.
>
> == Background ==
>
> SystemML started as a research project in the IBM Almaden Research Center
> around 2010 aiming to enable data scientists to develop machine learning
> algorithms independent of data and cluster characteristics.
>
> == Rationale ==
>
> SystemML enables the specification of machine learning algorithms using a
> declarative machine learning (DML) language. DML includes linear algebra
> primitives, statistical functions, and additional constructs. This
> high-level language significantly increases the productivity of data
> scientists as it provides (1) full flexibility in expressing custom
> analytics and (2) data independence from the underlying input formats and
> physical data representations.
>
> SystemML computations can be executed in a variety of different modes. It
> supports single node in-memory computations and large-scale distributed
> cluster computations. This allows the user to quickly prototype new
> algorithms in local environments but automatically scale to large data
> sizes as well without changing the algorithm implementation.
>
> Algorithms specified in DML are dynamically compiled and optimized based on
> data and cluster characteristics using rule-based and cost-based
> optimization techniques. The optimizer automatically generates hybrid
> runtime execution plans ranging from in-memory single-node execution to
> distributed computations on Spark or Hadoop. This ensures both efficiency
> and scalability. Automatic optimization reduces or eliminates the need to
> hand-tune distributed runtime execution plans and system configurations.
>
> == Initial Goals ==
>
> The initial goals to move SystemML to the Apache Incubator is to broaden
> the community foster the contributions from data scientists to develop new
> machine learning algorithms and enhance the existing ones. Ultimately, this
> may lead to the creation of an industry standard in specifying machine
> learning algorithms.
>
> == Current Status ==
>
> The initial code has been developed at the IBM Almaden Research Center in
> California and has recently been made available in GitHub under the Apache
> Software License 2.0. The project currently supports a single node (in
> memory computation) as well as distributed computations utilizing Hadoop or
> Spark clusters.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute operating to the
> standard of meritocracy that Apache emphasizes.
>
> === Community ===
>
> The need for a generic scalable and decla

Re: [DISCUSS] SystemML Incubator Proposal

2015-10-23 Thread Luciano Resende
On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputra 
wrote:

> Hi Luciano,
>
> Good proposal, but looks like
> https://wiki.apache.org/incubator/SystemM does not exist?
>

Good catch, it's a typo on the original link and it's missing the L at the
end, here is the correct link

https://wiki.apache.org/incubator/SystemML



>
> Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I
> don't they could be mentors of this project, yet.
>
> They can ask to be member of IPMCs since both are already member of
> ASF. But for now need to remove it from proposal.
>
>
>
Yes, they are aware of the requirement, and this will be fixed before we
call a vote on the proposal.



> - Henry
>
> On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende 
> wrote:
> > We would like to start a discussion on accepting SystemML as an Apache
> > Incubator project.
> >
> > The proposal is available at :
> > https://wiki.apache.org/incubator/SystemM
> >
> > And it's contents is also copied below.
> >
> > Thanks in Advance for you time reviewing and providing feedback.
> >
> > ==
> >
> > = SystemML =
> >
> > == Abstract ==
> >
> > SystemML provides declarative large-scale machine learning (ML) that aims
> > at flexible specification of ML algorithms and automatic generation of
> > hybrid runtime plans ranging from single node, in-memory computations, to
> > distributed computations on Apache Hadoop and  Apache Spark. ML
> algorithms
> > are expressed in an R-like syntax, that includes linear algebra
> primitives,
> > statistical functions, and ML-specific constructs. This high-level
> language
> > significantly increases the productivity of data scientists as it
> provides
> > (1) full flexibility in expressing custom analytics, and (2) data
> > independence from the underlying input formats and physical data
> > representations. Automatic optimization according to data characteristics
> > such as distribution on the disk file system, and sparsity as well as
> > processing characteristics in the distributed environment like number of
> > nodes, CPU, memory per node, ensures both efficiency and scalability.
> >
> > == Proposal ==
> >
> > The goal of SystemML is to create a commercial friendly, scalable and
> > extensible machine learning framework for data scientists to create or
> > extend machine learning algorithms using a declarative syntax. The
> machine
> > learning framework enables data scientists to develop algorithms locally
> > without the need of a distributed cluster, and scale up and scale out the
> > execution of these algorithms to distributed Hadoop or Spark clusters.
> >
> > == Background ==
> >
> > SystemML started as a research project in the IBM Almaden Research Center
> > around 2010 aiming to enable data scientists to develop machine learning
> > algorithms independent of data and cluster characteristics.
> >
> > == Rationale ==
> >
> > SystemML enables the specification of machine learning algorithms using a
> > declarative machine learning (DML) language. DML includes linear algebra
> > primitives, statistical functions, and additional constructs. This
> > high-level language significantly increases the productivity of data
> > scientists as it provides (1) full flexibility in expressing custom
> > analytics and (2) data independence from the underlying input formats and
> > physical data representations.
> >
> > SystemML computations can be executed in a variety of different modes. It
> > supports single node in-memory computations and large-scale distributed
> > cluster computations. This allows the user to quickly prototype new
> > algorithms in local environments but automatically scale to large data
> > sizes as well without changing the algorithm implementation.
> >
> > Algorithms specified in DML are dynamically compiled and optimized based
> on
> > data and cluster characteristics using rule-based and cost-based
> > optimization techniques. The optimizer automatically generates hybrid
> > runtime execution plans ranging from in-memory single-node execution to
> > distributed computations on Spark or Hadoop. This ensures both efficiency
> > and scalability. Automatic optimization reduces or eliminates the need to
> > hand-tune distributed runtime execution plans and system configurations.
> >
> > == Initial Goals ==
> >
> > The initial goals to move SystemML to the Apache Incubator is to broaden
> > the community foster the contributions from data scientists to develop
> new
> > machine learning algorithms and enhance the existing ones. Ultimately,
> this
> > may lead to the creation of an industry standard in specifying machine
> > learning algorithms.
> >
> > == Current Status ==
> >
> > The initial code has been developed at the IBM Almaden Research Center in
> > California and has recently been made available in GitHub under the
> Apache
> > Software License 2.0. The project currently supports a single node (in
> > memory computation) as well as distributed computations utilizing Hadoop
> or

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Shaofeng Shi
+1 (non-binding)

"Manoharan, Arun" 编写:

>Hello Everyone,
>
>Thanks for all the feedback on the Eagle Proposal.
>
>I would like to call for a [VOTE] on Eagle joining the ASF as an incubation 
>project.
>
>The vote is open for 72 hours:
>
>[ ] +1 accept Eagle in the Incubator
>[ ] ±0
>[ ] -1 (please give reason)
>
>Eagle is a Monitoring solution for Hadoop to instantly identify access to 
>sensitive data, recognize attacks, malicious activities and take actions in 
>real time. Eagle supports a wide variety of policies on HDFS data and Hive. 
>Eagle also provides machine learning models for detecting anomalous user 
>behavior in Hadoop.
>
>The proposal is available on the wiki here:
>https://wiki.apache.org/incubator/EagleProposal
>
>The text of the proposal is also available at the end of this email.
>
>Thanks for your time and help.
>
>Thanks,
>Arun
>
>
>
>Eagle
>
>Abstract
>Eagle is an Open Source Monitoring solution for Hadoop to instantly identify 
>access to sensitive data, recognize attacks, malicious activities in hadoop 
>and take actions.
>
>Proposal
>Eagle audits access to HDFS files, Hive and HBase tables in real time, 
>enforces policies defined on sensitive data access and alerts or blocks user’s 
>access to that sensitive data in real time. Eagle also creates user profiles 
>based on the typical access behaviour for HDFS and Hive and sends alerts when 
>anomalous behaviour is detected. Eagle can also import sensitive data 
>information classified by external classification engines to help define its 
>policies.
>
>Overview of Eagle
>Eagle has 3 main parts.
>1.Data collection and storage - Eagle collects data from various hadoop logs 
>in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
>2.Data processing and policy engine - Eagle allows users to create policies 
>based on various metadata properties on HDFS, Hive and HBase data.
>3.Eagle services - Eagle services include policy manager, query service and 
>the visualization component. Eagle provides intuitive user interface to 
>administer Eagle and an alert dashboard to respond to real time alerts.
>
>Data Collection and Storage:
>Eagle provides programming API for extending Eagle to integrate any data 
>source into Eagle policy evaluation framework. For example, Eagle hdfs audit 
>monitoring collects data from Kafka which is populated from namenode log4j 
>appender or from logstash agent. Eagle hive monitoring collects hive query 
>logs from running job through YARN API, which is designed to be scalable and 
>fault-tolerant. Eagle uses HBase as storage for storing metadata and metrics 
>data, and also supports relational database through configuration change.
>
>Data Processing and Policy Engine:
>Processing Engine: Eagle provides stream processing API which is an 
>abstraction of Apache Storm. It can also be extended to other streaming 
>engines. This abstraction allows developers to assemble data transformation, 
>filtering, external data join etc. without physically bound to a specific 
>streaming platform. Eagle streaming API allows developers to easily integrate 
>business logic with Eagle policy engine and internally Eagle framework 
>compiles business logic execution DAG into program primitives of underlying 
>stream infrastructure e.g. Apache Storm. For example, Eagle HDFS monitoring 
>transforms audit log from Namenode to object and joins sensitivity metadata, 
>security zone metadata which are generated from external programs or 
>configured by user. Eagle hive monitoring filters running jobs to get hive 
>query string and parses query string into object and then joins sensitivity 
>metadata.
>Alerting Framework: Eagle Alert Framework includes stream metadata API, 
>scalable policy engine framework, extensible policy engine framework. Stream 
>metadata API allows developers to declare event schema including what 
>attributes constitute an event, what is the type for each attribute, and how 
>to dynamically resolve attribute value in runtime when user configures policy. 
>Scalable policy engine framework allows policies to be executed on different 
>physical nodes in parallel. It is also used to define your own policy 
>partitioner class. Policy engine framework together with streaming 
>partitioning capability provided by all streaming platforms will make sure 
>policies and events can be evaluated in a fully distributed way. Extensible 
>policy engine framework allows developer to plugin a new policy engine with a 
>few lines of codes. WSO2 Siddhi CEP engine is the policy engine which Eagle 
>supports as first-class citizen.
>Machine Learning module: Eagle provides capabilities to define user activity 
>patterns or user profiles for Hadoop users based on the user behaviour in the 
>platform. These user profiles are modeled using Machine Learning algorithms 
>and used for detection of anomalous users activities. Eagle uses Eigen Value 
>Decomposition, and Density Estimation algorithms for generating user profile 

How to get more mentors on podlings

2015-10-23 Thread John D. Ament
Based on some the threads floating around I ask this of all.

How should a podling reach out when they need more active mentors?  I am
not saying there needs to be a big process or a small one, but at least
some guidance to give to poslongs when they need a boost.

It seems to me that sometimes we end up with a lot of mentors on easy to
complete podlings (groovy is a great recent example).  Harder podlings end
up struggling.

John


Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread 周千昊
+1

Shaofeng Shi 于2015年10月24日周六 08:40写道:

> +1 (non-binding)
>
> "Manoharan, Arun" 编写:
>
> >Hello Everyone,
> >
> >Thanks for all the feedback on the Eagle Proposal.
> >
> >I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
> >
> >The vote is open for 72 hours:
> >
> >[ ] +1 accept Eagle in the Incubator
> >[ ] ±0
> >[ ] -1 (please give reason)
> >
> >Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
> >
> >The proposal is available on the wiki here:
> >https://wiki.apache.org/incubator/EagleProposal
> >
> >The text of the proposal is also available at the end of this email.
> >
> >Thanks for your time and help.
> >
> >Thanks,
> >Arun
> >
> >
> >
> >Eagle
> >
> >Abstract
> >Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
> >
> >Proposal
> >Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
> >
> >Overview of Eagle
> >Eagle has 3 main parts.
> >1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> >2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> >3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
> >
> >Data Collection and Storage:
> >Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
> >
> >Data Processing and Policy Engine:
> >Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> >Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> >Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for

Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Luciano Resende
+1 (binding)

On Fri, Oct 23, 2015 at 7:11 AM, Manoharan, Arun 
wrote:

> Hello Everyone,
>
> Thanks for all the feedback on the Eagle Proposal.
>
> I would like to call for a [VOTE] on Eagle joining the ASF as an
> incubation project.
>
> The vote is open for 72 hours:
>
> [ ] +1 accept Eagle in the Incubator
> [ ] ±0
> [ ] -1 (please give reason)
>
> Eagle is a Monitoring solution for Hadoop to instantly identify access to
> sensitive data, recognize attacks, malicious activities and take actions in
> real time. Eagle supports a wide variety of policies on HDFS data and Hive.
> Eagle also provides machine learning models for detecting anomalous user
> behavior in Hadoop.
>
> The proposal is available on the wiki here:
> https://wiki.apache.org/incubator/EagleProposal
>
> The text of the proposal is also available at the end of this email.
>
> Thanks for your time and help.
>
> Thanks,
> Arun
>
> 
>
> Eagle
>
> Abstract
> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> identify access to sensitive data, recognize attacks, malicious activities
> in hadoop and take actions.
>
> Proposal
> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> enforces policies defined on sensitive data access and alerts or blocks
> user’s access to that sensitive data in real time. Eagle also creates user
> profiles based on the typical access behaviour for HDFS and Hive and sends
> alerts when anomalous behaviour is detected. Eagle can also import
> sensitive data information classified by external classification engines to
> help define its policies.
>
> Overview of Eagle
> Eagle has 3 main parts.
> 1.Data collection and storage - Eagle collects data from various hadoop
> logs in real time using Kafka/Yarn API and uses HDFS and HBase for storage.
> 2.Data processing and policy engine - Eagle allows users to create
> policies based on various metadata properties on HDFS, Hive and HBase data.
> 3.Eagle services - Eagle services include policy manager, query service
> and the visualization component. Eagle provides intuitive user interface to
> administer Eagle and an alert dashboard to respond to real time alerts.
>
> Data Collection and Storage:
> Eagle provides programming API for extending Eagle to integrate any data
> source into Eagle policy evaluation framework. For example, Eagle hdfs
> audit monitoring collects data from Kafka which is populated from namenode
> log4j appender or from logstash agent. Eagle hive monitoring collects hive
> query logs from running job through YARN API, which is designed to be
> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> metadata and metrics data, and also supports relational database through
> configuration change.
>
> Data Processing and Policy Engine:
> Processing Engine: Eagle provides stream processing API which is an
> abstraction of Apache Storm. It can also be extended to other streaming
> engines. This abstraction allows developers to assemble data
> transformation, filtering, external data join etc. without physically bound
> to a specific streaming platform. Eagle streaming API allows developers to
> easily integrate business logic with Eagle policy engine and internally
> Eagle framework compiles business logic execution DAG into program
> primitives of underlying stream infrastructure e.g. Apache Storm. For
> example, Eagle HDFS monitoring transforms audit log from Namenode to object
> and joins sensitivity metadata, security zone metadata which are generated
> from external programs or configured by user. Eagle hive monitoring filters
> running jobs to get hive query string and parses query string into object
> and then joins sensitivity metadata.
> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> scalable policy engine framework, extensible policy engine framework.
> Stream metadata API allows developers to declare event schema including
> what attributes constitute an event, what is the type for each attribute,
> and how to dynamically resolve attribute value in runtime when user
> configures policy. Scalable policy engine framework allows policies to be
> executed on different physical nodes in parallel. It is also used to define
> your own policy partitioner class. Policy engine framework together with
> streaming partitioning capability provided by all streaming platforms will
> make sure policies and events can be evaluated in a fully distributed way.
> Extensible policy engine framework allows developer to plugin a new policy
> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> engine which Eagle supports as first-class citizen.
> Machine Learning module: Eagle provides capabilities to define user
> activity patterns or user profiles for Hadoop users based on the user
> behaviour in the platform. These user profiles are modeled using Machine
> Learning algorithms and used for detection of anomalous users activities.
> Eagle uses Eigen Value Decomposition

Re: [DISCUSS] SystemML Incubator Proposal

2015-10-23 Thread Hitesh Shah
Hi Luciano, 

If you need any additional mentors, let me know. I would be interested in 
helping out. 

thanks
— Hitesh 


On Oct 23, 2015, at 4:34 PM, Luciano Resende  wrote:

> We would like to start a discussion on accepting SystemML as an Apache
> Incubator project.
> 
> The proposal is available at :
> https://wiki.apache.org/incubator/SystemM
> 
> And it's contents is also copied below.
> 
> Thanks in Advance for you time reviewing and providing feedback.
> 
> ==
> 
> = SystemML =
> 
> == Abstract ==
> 
> SystemML provides declarative large-scale machine learning (ML) that aims
> at flexible specification of ML algorithms and automatic generation of
> hybrid runtime plans ranging from single node, in-memory computations, to
> distributed computations on Apache Hadoop and  Apache Spark. ML algorithms
> are expressed in an R-like syntax, that includes linear algebra primitives,
> statistical functions, and ML-specific constructs. This high-level language
> significantly increases the productivity of data scientists as it provides
> (1) full flexibility in expressing custom analytics, and (2) data
> independence from the underlying input formats and physical data
> representations. Automatic optimization according to data characteristics
> such as distribution on the disk file system, and sparsity as well as
> processing characteristics in the distributed environment like number of
> nodes, CPU, memory per node, ensures both efficiency and scalability.
> 
> == Proposal ==
> 
> The goal of SystemML is to create a commercial friendly, scalable and
> extensible machine learning framework for data scientists to create or
> extend machine learning algorithms using a declarative syntax. The machine
> learning framework enables data scientists to develop algorithms locally
> without the need of a distributed cluster, and scale up and scale out the
> execution of these algorithms to distributed Hadoop or Spark clusters.
> 
> == Background ==
> 
> SystemML started as a research project in the IBM Almaden Research Center
> around 2010 aiming to enable data scientists to develop machine learning
> algorithms independent of data and cluster characteristics.
> 
> == Rationale ==
> 
> SystemML enables the specification of machine learning algorithms using a
> declarative machine learning (DML) language. DML includes linear algebra
> primitives, statistical functions, and additional constructs. This
> high-level language significantly increases the productivity of data
> scientists as it provides (1) full flexibility in expressing custom
> analytics and (2) data independence from the underlying input formats and
> physical data representations.
> 
> SystemML computations can be executed in a variety of different modes. It
> supports single node in-memory computations and large-scale distributed
> cluster computations. This allows the user to quickly prototype new
> algorithms in local environments but automatically scale to large data
> sizes as well without changing the algorithm implementation.
> 
> Algorithms specified in DML are dynamically compiled and optimized based on
> data and cluster characteristics using rule-based and cost-based
> optimization techniques. The optimizer automatically generates hybrid
> runtime execution plans ranging from in-memory single-node execution to
> distributed computations on Spark or Hadoop. This ensures both efficiency
> and scalability. Automatic optimization reduces or eliminates the need to
> hand-tune distributed runtime execution plans and system configurations.
> 
> == Initial Goals ==
> 
> The initial goals to move SystemML to the Apache Incubator is to broaden
> the community foster the contributions from data scientists to develop new
> machine learning algorithms and enhance the existing ones. Ultimately, this
> may lead to the creation of an industry standard in specifying machine
> learning algorithms.
> 
> == Current Status ==
> 
> The initial code has been developed at the IBM Almaden Research Center in
> California and has recently been made available in GitHub under the Apache
> Software License 2.0. The project currently supports a single node (in
> memory computation) as well as distributed computations utilizing Hadoop or
> Spark clusters.
> 
> === Meritocracy ===
> 
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute operating to the
> standard of meritocracy that Apache emphasizes.
> 
> === Community ===
> 
> The need for a generic scalable and declarative machine learning approach
> in the open source is tremendous, so there is a potential for a very large
> community. We believe that SystemML’s extensible architecture, declarative
> syntax, cost base

Re: How to get more mentors on podlings

2015-10-23 Thread Upayavira


On Sat, Oct 24, 2015, at 01:43 AM, John D. Ament wrote:
> Based on some the threads floating around I ask this of all.
> 
> How should a podling reach out when they need more active mentors?  I am
> not saying there needs to be a big process or a small one, but at least
> some guidance to give to poslongs when they need a boost.
> 
> It seems to me that sometimes we end up with a lot of mentors on easy to
> complete podlings (groovy is a great recent example).  Harder podlings
> end
> up struggling.

Same as with a new podling. Appeal here - ask people who you maybe know
to have an interest. Attempt to sell the project.

Whatever ways you can that will interest a potential mentor in signing
up.

Upayavira

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: How to get more mentors on podlings

2015-10-23 Thread Ross Gardler
I see this as the Champions role. You could ask for volunteers, and it will get 
you folks but you really want people who are invested. As a champion I consider 
it my job to fund such folks.

Sent from my Windows Phone

From: John D. Ament
Sent: ‎10/‎23/‎2015 5:43 PM
To: general@incubator.apache.org
Subject: How to get more mentors on podlings

Based on some the threads floating around I ask this of all.

How should a podling reach out when they need more active mentors?  I am
not saying there needs to be a big process or a small one, but at least
some guidance to give to poslongs when they need a boost.

It seems to me that sometimes we end up with a lot of mentors on easy to
complete podlings (groovy is a great recent example).  Harder podlings end
up struggling.

John