Re: Incubator Proposal: Pig

2007-09-20 Thread Torsten Curdt

Done!

On 20.09.2007, at 19:46, Doug Cutting wrote:


Torsten Curdt wrote:

+1
Actually I would also be interested in stepping up as a mentor.


Thanks, that'd be great!

Please add yourself to the proposal in the wiki.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Incubator Proposal: Pig

2007-09-20 Thread Otis Gospodnetic
Big +1! :)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Olga Natkovich <[EMAIL PROTECTED]>
To: general@incubator.apache.org
Sent: Tuesday, September 18, 2007 3:52:23 PM
Subject: Incubator Proposal: Pig

Hi,
 
Yahoo! research and development teams have developed a proposal below. The
proposal is also available on wiki at

http://wiki.apache.org/incubator/PigProposal.
We would like to ask that the ASF consider forming a podling according to
the proposal.

Thanks,

Olga Natkovich
  [EMAIL PROTECTED]


-

= Pig Open Source Proposal =

== Abstract ==

Pig is a platform for analyzing large data sets. 

== Proposal ==

The Pig project consists of high-level languages for expressing data
analysis programs, coupled with infrastructure for evaluating these
programs. The salient property of Pig programs is that their structure is
amenable to substantial parallelization, which in turns enables them to
handle very large data sets.

At the present time, Pig's infrastructure layer consists of a compiler that
produces sequences of Map-Reduce programs, for which large-scale parallel
implementations already exist (e.g., the Hadoop subproject). Pig's language
layer currently consists of a textual language called Pig Latin, which has
the following key properties:

 1. ''Ease of programming''. It is trivial to achieve parallel execution of
simple, "embarrassingly parallel" data analysis tasks. Complex tasks
comprised of multiple interrelated data transformations are explicitly
encoded as data flow sequences, making them easy to write, understand, and
maintain.
 2. ''Optimization opportunities''. The way in which tasks are encoded
permits the system to optimize their execution automatically, allowing the
user to focus on semantics rather than efficiency.
 3. ''Extensibility''. Users can create their own functions to do
special-purpose processing. 

== Background ==

Pig started as a research project at Yahoo! in May of 2006 to combine ideas
in parallel databases and distributed computing. The first internal release
took place in July 2006. The first release was a simple front-end to the
Hadoop Map/Reduce framework. The following releases added new features and
evolved the language based on user feedback. In July 2007, pig was taken
over by a development team and the first production version is due to be
released on 9/28/07.

Since its inception, we had observed a steady growth of the user community
within Yahoo!.  In April 2007, Pig was released under a BSD-type license.
Several external parties are using this version and have expressed interest
in collaborating on its development.

== Rationale ==

In an information-centric world, innovation is driven by ad-hoc analysis of
large data sets. For example, search engine companies routinely deploy and
refine services based on analyzing the recorded behavior of users,
publishers, and advertisers. The rate of innovation depends on the
efficiency with which data can be
analyzed.

To analyze large data sets efficiently, one needs parallelism. The cheapest
and most scalable form of parallelism is cluster computing. Unfortunately,
programming for a cluster computing environment is difficult and
time-consuming. Pig makes it easy to harness the power of cluster computing
for ad-hoc data analysis. 

While other language exist that try to achieve the same goals, we believe
that Pig provides more flexibility and gives more control to the end user. 

SQL typically requires (1) importing data from a user's preferred format
into a database system's internal format (2) well-structured, normalized
data with a declared schema, and (3) programs expressed in declarative
SELECT-FROM-WHERE blocks. In contrast, Pig Latin facilitates (1)
interoperability, i.e. data may be read/written in a format accepted by
other applications such as text editors or graph generators (2) flexibility,
i.e. data may be loosely structured or have structure that is
defined operationally, and (3) adoption by programmers who find procedural
programming more natural than declarative programming.

Sawzall is a scripting language used at Google on top of Map-Reduce. A
sawzall program has a fairly rigid structure consisting of a filtering phase
(the map step) followed by an aggregation phase (the reduce step).
Furthermore, only the filtering phase can be written by the user, and only a
pre-built set of aggregations are available (new ones are non-trivial to
add). While Pig Latin has similar higher level primitives like filtering and
aggregation, an arbitrary number of them can be flexibly chained together in
a Pig Latin program, and all primitives can use user-defined functions with
equal ease. Further, Pig Latin has additional primitives such as cogrouping,
that

Re: Renaming the SPL Incubator Proposal

2007-09-20 Thread Martin Cooper
On 9/19/07, David L Kaminsky <[EMAIL PROTECTED]> wrote:
>
> I spoke to some of the developers here, and we're down to two names:
>
> 1) Smores - "Mores" are social customs, which relates to policy, but
> Smores
> taste better.


You mean s'mores (i.e. with the apostrophe), which most people outside the
USA have probably never heard of. ;-)

--
Martin Cooper


2) Imperio - the curse used to gain complete control of another.
>
> If anyone has an objection to either name, please let me know.  Otherwise,
> we'll choose one and update the proposal.
>
> We did also consider APE, YAPE, Axiom, Sage and Mage, but they are all
> used
> by other OSS projects.
>
> David
>


Re: Incubator Proposal: Pig

2007-09-20 Thread Doug Cutting

Torsten Curdt wrote:

+1

Actually I would also be interested in stepping up as a mentor.


Thanks, that'd be great!

Please add yourself to the proposal in the wiki.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Incubator Proposal: Pig

2007-09-20 Thread Torsten Curdt


On 20.09.2007, at 19:06, Robert Burrell Donkin wrote:


On 9/20/07, Leo Simons <[EMAIL PROTECTED]> wrote:

On Sep 18, 2007, at 9:52 PM, Olga Natkovich wrote:

Yahoo! research and development teams have developed a proposal
below. The
proposal is also available on wiki at

http://wiki.apache.org/incubator/PigProposal.
We would like to ask that the ASF consider forming a podling
according to
the proposal.
...
Pig is a platform for analyzing large data sets.


+1, looks cool!

...seems like your biggest challenge here is attracting a diverse
developer community, and hopefully the apache incubation process will
help you there...


+1


+1

Actually I would also be interested in stepping up as a mentor.

cheers
--
Torsten


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Incubator Proposal: Pig

2007-09-20 Thread Robert Burrell Donkin
On 9/20/07, Leo Simons <[EMAIL PROTECTED]> wrote:
> On Sep 18, 2007, at 9:52 PM, Olga Natkovich wrote:
> > Yahoo! research and development teams have developed a proposal
> > below. The
> > proposal is also available on wiki at
> > 
> > http://wiki.apache.org/incubator/PigProposal.
> > We would like to ask that the ASF consider forming a podling
> > according to
> > the proposal.
> > ...
> > Pig is a platform for analyzing large data sets.
>
> +1, looks cool!
>
> ...seems like your biggest challenge here is attracting a diverse
> developer community, and hopefully the apache incubation process will
> help you there...

+1

it's very important to focus on encouraging new developers in the
neonate period of a project

the energy required to let people know about a new project is often
underestimated. the open source space is now much bigger and more
diffuse than years ago. so it's not as easy for interesting projects
and interested people to find each other any more. stuff like blogging
(www.planetapache.org aggregates many blogs written by apache
committers) and podcasting (www.feathercast.org is an apache podcast)
are useful but tend to reach only people who are already interested in
apache. articles, grassroots meeting and conference talks are also
important.

one of the black arts is trying to ensure that the right level of
exposure is acheived. too much too early before the development
infrastructure is ready leads to disappointment but too little too
late when the project is too finished means that there is less chance
for meaningful contributions to be made.

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [Fwd: Re: Incubator Proposal: SPL]

2007-09-20 Thread Alex Karasulu
David,

Thanks for taking the time to respond with this insightful explanation to my
questions.  It
helped me to understand clearly that there is little if any overlap with
SPL.

Alex

On 9/20/07, David L Kaminsky <[EMAIL PROTECTED]> wrote:
>
> Hey Alex,
>
> I looked at the access control for Triplesec, and it looks somewhat like
> XACML from Oasis:
> http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xacml
>
> Here are my general thoughts on the overlap between SPL and access control
> languages:
>
> 1. It is possible to express access control policies in SPL (using if-then
> syntax), but it's much more natural to express access control policies using
> domain-specific syntax. For example, XACML has a pretty direct encoding of
> "Nurses can access medical records" ("nurses" is the role, "can access" is
> the access, and "medical records" is the subject), so it's natural to
> express such policies in XACML. The SPL syntax wouldn't be nearly as clean.
>
> 2. It is also possible to express IT management policies (e.g., "backup
> the database when data have changed 10%") using some access control
> languages, but you really have to misuse some of their concepts (typically
> "obligations"), and the expressions would get really hairy. In practice,
> people wouldn't really want to do it.
>
> At the end of the day, the discussion can end up looking a lot like
> discussions of programming languages. IMO, despite the overlap in some
> areas, there's a good reason that people still write in all of C, C++, Java,
> Perl, FORTRAN, etc. -- people choose the language that best suits a
> particular need.
>
> I don't think we'll need quite the same breadth of policy languages, but I
> also don't think we'll get down to one.
>
> David
>
>
>
>
> >
> >
> >  Original Message 
> > Subject:Re: Incubator Proposal: SPL
> > Date:Tue, 18 Sep 2007 16:36:03 -0400
> > From:Alex Karasulu <[EMAIL PROTECTED]>
> > Reply-To:general@incubator.apache.org
> > To:general@incubator.apache.org, [EMAIL PROTECTED]
> > References:<[EMAIL PROTECTED]>
> > <[EMAIL PROTECTED]>
> >
> >
> >
> > Hi all,
> >
> > Over at Directory we have an initial attempt at an identity solution in
> > place called Triplesec.
> > It does the usual AAA with some additional things like mobile keyfobs
> > however it's authorization
> > policy management features might benefit from this project or there may
> be
> > some overlap.  Here's
> > a link btw for some additional information on tsec:
> >
> > http://directory.apache.org/triplesec
> >
> > Specifically the following information refers to the authorization
> policy
> > store and an API to access
> > the information therein which can be stored in LDAP or in an LDIF file
> > (exported from LDAP).
> >
> > http://directory.apache.org/triplesec/guardian-api-users-guide.html
> >
> http://directory.apache.org/triplesec/authorization-using-guardian-api.html
> >
> http://directory.apache.org/triplesec/administration-tool-users-guide.html
> >
> > So the big question is there much overlap here?  An initial glance tells
> me
> > there might not be
> > but I may be wrong.  Thoughts?
> >
> > Alex
> >
> > On 9/17/07, Filip at Apache <[EMAIL PROTECTED]> wrote:
> > >
> > > Noel J. Bergman wrote:
> > > >> We proposed to develop a policy-based management infrastructure
> that
> > > >> automates administrative tasks by executing policies
> > > >>
> > > >
> > > > Sounds good.  I will be curious to see the reaction from the HTTP
> Server
> > > > folks, but this sort of thing is very much needed in real-world
> > > deployments
> > > > of app servers.
> > > >
> > > >
> > > >> The initial goals are to develop an SPL evaluation engine and
> > > >> bindings to the APIs for [...]
> > > >>
> > > >
> > > > What about Tomcat?
> > > >
> > > I'd be happy to help out with this piece, should the vote to
> incubation
> > > go through
> > >
> > > Filip
> > > >
> > > >> Nominated Mentors
> > > >> -Bill Stoddard([EMAIL PROTECTED])
> > > >>
> > > >
> > > > Glad to see that Bill will have cycles for this.  :-)
> > > >
> > > > Would you please take a look at lokahi (
> > > http://incubator.apache.org/lokahi)
> > > > and comment on any synergies that you see?
> > > >
> > > >   --- Noel
> > > >
> > > >
> > > >
> > > >
> -
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
>
>


[VOTE] Graduate Ivy as a subproject of Ant

2007-09-20 Thread Xavier Hanin
Hi,

As discussed recently on this mailing list [1], I would like to start a
community vote to decide if the Ivy community feels ready to graduate as a
subproject of Ant.

The graduation guide [2] can be used as a basis to collect information about
what is usually necessary to graduate.

Note that this is only a community vote, and is the first of three votes
required for graduation, as explained in the graduation guide [2].

[ ] Yes, I think the Ivy podling is ready to graduate as a subproject of Ant
[ ] No, I don't think the Ivy podling is ready to graduate as a subproject
of Ant because ...

Everybody is welcome to voice his opinion. Cast your votes!

Xavier

[1] http://www.nabble.com/What-about-Graduation--tf4447692.html
[2] http://incubator.apache.org/guides/graduation.html
-- 
Xavier Hanin - Independent Java Consultant
http://xhab.blogspot.com/
http://incubator.apache.org/ivy/
http://www.xoocode.org/


Re: [Fwd: Re: Incubator Proposal: SPL]

2007-09-20 Thread David L Kaminsky

Hey Alex,

I looked at the access control for Triplesec, and it looks somewhat like
XACML from Oasis:
  http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xacml

Here are my general thoughts on the overlap between SPL and access control
languages:

1. It is possible to express access control policies in SPL (using if-then
syntax), but it's much more natural to express access control policies
using domain-specific syntax.  For example, XACML has a pretty direct
encoding of "Nurses can access medical records" ("nurses" is the role, "can
access" is the access, and "medical records" is the subject), so it's
natural to express such policies in XACML.  The SPL syntax wouldn't be
nearly as clean.

2. It is also possible to express IT management policies (e.g., "backup the
database when data have changed 10%") using some access control languages,
but you really have to misuse some of their concepts (typically
"obligations"), and the expressions would get really hairy.  In practice,
people wouldn't really want to do it.

At the end of the day, the discussion can end up looking a lot like
discussions of programming languages.  IMO, despite the overlap in some
areas, there's a good reason that people still write in all of C, C++,
Java, Perl, FORTRAN, etc. -- people choose the language that best suits a
particular need.

I don't think we'll need quite the same breadth of policy languages, but I
also don't think we'll get down to one.

David



>
>
>  Original Message 
> Subject:Re: Incubator Proposal: SPL
> Date:Tue, 18 Sep 2007 16:36:03 -0400
> From:Alex Karasulu <[EMAIL PROTECTED]>
> Reply-To:general@incubator.apache.org
> To:general@incubator.apache.org, [EMAIL PROTECTED]
> References:<[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
>
>
>
> Hi all,
>
> Over at Directory we have an initial attempt at an identity solution in
> place called Triplesec.
> It does the usual AAA with some additional things like mobile keyfobs
> however it's authorization
> policy management features might benefit from this project or there may
be
> some overlap.  Here's
> a link btw for some additional information on tsec:
>
> http://directory.apache.org/triplesec
>
> Specifically the following information refers to the authorization policy
> store and an API to access
> the information therein which can be stored in LDAP or in an LDIF file
> (exported from LDAP).
>
> http://directory.apache.org/triplesec/guardian-api-users-guide.html
>
http://directory.apache.org/triplesec/authorization-using-guardian-api.html
>
http://directory.apache.org/triplesec/administration-tool-users-guide.html
>
> So the big question is there much overlap here?  An initial glance tells
me
> there might not be
> but I may be wrong.  Thoughts?
>
> Alex
>
> On 9/17/07, Filip at Apache <[EMAIL PROTECTED]> wrote:
> >
> > Noel J. Bergman wrote:
> > >> We proposed to develop a policy-based management infrastructure that
> > >> automates administrative tasks by executing policies
> > >>
> > >
> > > Sounds good.  I will be curious to see the reaction from the HTTP
Server
> > > folks, but this sort of thing is very much needed in real-world
> > deployments
> > > of app servers.
> > >
> > >
> > >> The initial goals are to develop an SPL evaluation engine and
> > >> bindings to the APIs for [...]
> > >>
> > >
> > > What about Tomcat?
> > >
> > I'd be happy to help out with this piece, should the vote to incubation
> > go through
> >
> > Filip
> > >
> > >> Nominated Mentors
> > >> -Bill Stoddard([EMAIL PROTECTED])
> > >>
> > >
> > > Glad to see that Bill will have cycles for this.  :-)
> > >
> > > Would you please take a look at lokahi (
> > http://incubator.apache.org/lokahi)
> > > and comment on any synergies that you see?
> > >
> > >   --- Noel
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

Re: Incubator Proposal: Pig

2007-09-20 Thread Leo Simons

On Sep 18, 2007, at 9:52 PM, Olga Natkovich wrote:
Yahoo! research and development teams have developed a proposal  
below. The

proposal is also available on wiki at

http://wiki.apache.org/incubator/PigProposal.
We would like to ask that the ASF consider forming a podling  
according to

the proposal.
...
Pig is a platform for analyzing large data sets.


+1, looks cool!

...seems like your biggest challenge here is attracting a diverse  
developer community, and hopefully the apache incubation process will  
help you there...


cheers,

Leo Simons
--
http://www.leosimons.com/blog/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]