[VOTE] MADlib v1.11-rc3

2017-05-04 Thread Rashmi Raghu
Hello MADlib community,

We have created a MADlib 1.11 RC-3, with the artifacts below (source and
convenience binaries) up for a vote.

Note that voting for the RC-2 release has been cancelled due to the need
for minor corrections based on community feedback. Sorry for the
inconvenience.

RC-3 replaces RC-2 with the following minor changes:
* Ensure product naming is consistently 'Apache MADlib (incubating)'
* Git revision tag changed to rc/1.11-rc3

This will be the 5th release for Apache MADlib (incubating).

The main goals of this release are:
* new module (PageRank for graph analytics with grouping support included)
* improvements to existing modules (add grouping support to Single Source
Shortest Path, reduce memory footprint of DT and RF, include NULL features
in training DT, add support for array and svec output for Pivot module,
utility to unnest 2-D arrays into rows of 1-D arrays)
* platform updates (GPDB 5)
* updates for Apache Top Level Project readiness and build process on
Apache infrastructure
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11

*** Please download, review and vote by Tue May 09, 2017 @ 6pm PDT ***

We're voting upon the source and convenience binaries below:

Source Repository (tag):  rc/1.11-rc3
https://github.com/apache/incubator-madlib/tree/rc/1.11-rc3

Source Files and convenience Binaries:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.11-incubating-rc3/

Commit:
https://github.com/apache/incubator-madlib/commit/8e2778a3921aa99f009962756881ce4bea5eee16

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, PMC members please be sure to indicate
"(binding)" with the vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)


Regards,
Rashmi Raghu

-- 
Rashmi Raghu, Ph.D.
Pivotal Data Science


Re: [DISCUSS] Graduation

2017-05-04 Thread Greg Chase
Is that the 5th release I see that just got posted?

Ya, time for MADlib to fly the incubator!



On Thu, May 4, 2017 at 10:48 AM, Frank McQuillan 
wrote:

> Thanks Roman.
>
> I agree that this project is in the correct state to qualify as a TLP,
> and would like to help move that forward.
>
> In addition to the
> https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
> that you mention, we also created a check list
> https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
> which aims to describe where the project stands according the the Apache
> project maturity model.
>
> I would encourage members of the Apache MADlib community to take a look
> at the check list and comment on any of the items there.
>
> The project mgmt part of the wiki
> https://cwiki.apache.org/confluence/display/MADLIB/Project+Management
> also gives a pretty good snapshot of the project as it stands today.
>
> Frank
>
>
> On Thu, May 4, 2017 at 10:44 AM, Rahul Iyer  wrote:
>
> > Hi Roman,
> >
> > Many thanks for your excellent mentorship!
> >
> > Your #2 and #3 proposals sound good to me and I look forward to the
> > discussion on private@.
> >
> > - Rahul
> >
> >
> > On Fri, Apr 28, 2017 at 10:47 AM, Roman Shaposhnik 
> wrote:
> > > Hi!
> > >
> > > with the fifth (v1.11) release in the final stages of being cut,
> > > I think now would be a good time to officially start our graduation
> > > discussion. With my mentor hat on, I feel that the project is
> > > mature and self-reliant enough to qualify as a TLP.
> > >
> > > Process-wise graduation consists of drafting a board resolution,
> > > getting it approved by the IPMC and finally submitting it to the ASF
> > > board's consideration. At the very minimum your resolution will
> contain:
> > > 1. A name of the project (I assume that'll be MADlib)
> > > 2. A list of proposed PMC members
> > > 3. A proposed PMC chair
> > > A good example of a resolution can be found here:
> > > https://cwiki.apache.org/confluence/display/FINERACT/
> > Graduation+Resolution
> > >
> > > In fact, Frank and I took the liberty to use that as the basis for our
> > own:
> > >  https://cwiki.apache.org/confluence/display/MADLIB/
> > Graduation+Resolution
> > > Please read it carefully and let us know what do you think.
> > >
> > > On #2 my suggestion would be to have an opt-in system. Basically
> > > we will kick off the thread off on private@madlib asking current PPMC
> > > members if they are willing to continue on the PMC.
> > >
> > > On #3 I typically recommend podlings I mentor to setup a rotating chair
> > > policy. This is, in no way, an ASF requirement so feel free to ignore
> it,
> > > but it worked well before. The chair will be expected up for rotation
> > every
> > > year. It will be more that ok for the same person to self-nominate once
> > > the year is up -- but at the same time it'll be up to the same person
> to
> > > actually kick off a thread asking if anybody else is interested in
> > serving
> > > as a chair for the next year. Of course, if there multiple candidates
> > there
> > > will have to be a vote.
> > >
> > > Speaking of self-nomination -- the same thread that we're going to kick
> > > off as part of solving for #2 will ask for folks to self-nominate as an
> > initial
> > > chair to be listed on the resolution.
> > >
> > > Unless somebody objects strongly to my #2 and #3 proposals I'm going
> > > to kick of this thread on private@.
> > >
> > > With that in mind, lets make the rest of the discussion on dev@ to be
> > about
> > > collecting the datapoints to present to IPCM as part of us asking them
> to
> > > vote YES on our graduation. Lets collect all these data points in the
> > same
> > > wiki page:
> > > https://cwiki.apache.org/confluence/display/MADLIB/
> > Graduation+Resolution
> > > Or if you feel that a discussion may be needed -- just reply to this
> > thread.
> > >
> > > Thanks,
> > > Roman.
> >
>


[GitHub] incubator-madlib pull request #130: MADLIB-1098. Corrections for MADlib nami...

2017-05-04 Thread rvs
Github user rvs closed the pull request at:

https://github.com/apache/incubator-madlib/pull/130


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib issue #130: MADLIB-1098. Corrections for MADlib naming cons...

2017-05-04 Thread rvs
Github user rvs commented on the issue:

https://github.com/apache/incubator-madlib/pull/130
  
Good point. Let me update the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request #129: DT/RF: Allow expressions in feature list

2017-05-04 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/129#discussion_r114846053
  
--- Diff: 
src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in ---
@@ -157,25 +150,24 @@ def _classify_features(feature_to_type, features):
 cat_types = int_types + text_types + boolean_types
 ordered_cat_types = int_types
 
-cat_features = [c for c in features
-if _dict_get_quoted(feature_to_type, c) in cat_types]
-ordered_cat_features = [c for c in features if _dict_get_quoted(
-feature_to_type, c) in ordered_cat_types]
+cat_features = [c for c in features if feature_to_type[c] in cat_types]
+ordered_cat_features = [c for c in features
+if feature_to_type[c] in ordered_cat_types]
 
 cat_features_set = set(cat_features)
 # continuous types - 'real' is cast to 'double precision' for 
uniformity
-con_types = ['real', 'float8', 'double precision']
+con_types = ['real', 'float8', 'double precision', 'numeric']
 con_features = [c for c in features
 if (c not in cat_features_set and
--- End diff --

do we need to check if `c not in cat_features_set`? At the moment, it seems 
like `cat_types` and `con_types` are mutually exclusive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request #130: MADLIB-1098. Corrections for MADlib nami...

2017-05-04 Thread rvs
GitHub user rvs opened a pull request:

https://github.com/apache/incubator-madlib/pull/130

MADLIB-1098. Corrections for MADlib naming consistency



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvs/incubator-madlib master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/130.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #130


commit eeed91b570120fe4d47cc2f2f07ed1aa304acc14
Author: Roman Shaposhnik 
Date:   2017-05-04T18:16:42Z

MADLIB-1098. Corrections for MADlib naming consistency




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] MADlib v1.11-rc2

2017-05-04 Thread Rashmi Raghu
Thanks Ed for all your observations on RC-2 including product naming and
git revision inconsistencies.

Let us then cancel the RC-2 release. We will address issues, release RC-3
and put it up for a vote soon.

Thanks again,
Rashmi

On Wed, May 3, 2017 at 11:15 AM, Rahul Iyer  wrote:

> Re: incorrect git revision in the files
>
> The revision string is obtained using
> ​
> ​​
>  'git describe' and the value of
> ​`​
> rel/v1.10.0-30-g0ff829a
> ​`
>  indicates that the
> ​ commit is
> ​30 commits above the
> v1.10.0
> commit, with the commit SHA starting with
> 0ff829a
> ​. The difficulty with ensuring it contains the `
> rel/v1.11` tag
> is that we don't yet have a
> v1.11
> release. The release tag can only be finalized after it has been
> successfully voted upon. Since the release tags on apache are immutable, we
> can't push them out before voting.
>
> The DMGs are built on the release manager's local machine, so we can have
> local tags to get the right string.
> The RPMs, however, are built on Jenkins/other CI server which only contain
> the remote tags. The best we could do is have `
> rc/v1.11-rc2
> ` instead of the current string.
>
> - Rahul
>
>
> On May 3, 2017 9:32 AM, "Frank McQuillan"  wrote:
>
> Ed,
>
>
> Thanks for your review,  all comments big and small certainly encouraged
> and welcome.
>
> Regarding the JIRAs that are not closed, the actual work has been done so
> there is nothing material pending.  But I did not close them because I
> wanted @rvs to do that, since he was the one overseeing them.  I will ask
> him to close them at his earliest convenience.
>
> Frank
>
> On Wed, May 3, 2017 at 8:58 AM, Ed Espino  wrote:
>
> > Sorry about the piecemeal observations. I'm currently in Beijing and
> don't
> > have a lot of extra large time chunks to review the release in one
> sitting.
> >
> > 1) There are still three outstanding Jira issues in an "Unresolved" state
> > with a fix version of v1.11.  Are they going to be resolved soon? They
> can
> > be seen with the following url:
> >
> > https://issues.apache.org/jira/browse/MADLIB/fixforversion/1
> > 2339592/?selectedTab=com.atlassian.jira.jira-projects-plugin
> > :version-summary-panel
> >
> > 2) As it relates to the convenience binary release, I noticed an
> > inconsistent MADLIB_GIT_REVISION value (rel/v1.10.0) spread throughout
> > several SQLCommon.m4 files. Shouldn't the reference be to v1.11 instead
> of
> > v1.10?
> >
> > 
> > MAC (notice rel/v1.10.0-30-g0ff829a value):
> > 
> >
> > ✔ /usr/local/madlib/Versions
> > 23:42 $ grep -n -i -r MADLIB_GIT_REVISION *
> > 1.11/ports/greenplum/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> > 1.11/ports/hawq/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> > 1.11/ports/postgres/9.4/madpack/SQLCommon.m4:20:m4_define(`_
> > _MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-30-g0ff829a')
> > 1.11/ports/postgres/9.5/madpack/SQLCommon.m4:20:m4_define(`_
> > _MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-30-g0ff829a')
> > 1.11/ports/postgres/9.6/madpack/SQLCommon.m4:20:m4_define(`_
> > _MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-30-g0ff829a')
> > 1.11/ports/postgres/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> >
> > 
> > Linux (notice rel/v1.10.0-31-gd54be2b value):
> > 
> >
> > [root@ip-172-31-9-242 Versions]# rpm -qa | grep madlib
> > madlib-1.11-1.x86_64
> > [root@ip-172-31-9-242 Versions]# pwd
> > /usr/local/madlib/Versions
> > [root@ip-172-31-9-242 Versions]# grep -n -i -r MADLIB_GIT_REVISION *
> > 1.11/ports/greenplum/4.2/madpack/SQLCommon.m4:20:m4_define(`
> > __MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/greenplum/4.3/madpack/SQLCommon.m4:20:m4_define(`
> > __MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/greenplum/4.3ORCA/madpack/SQLCommon.m4:20:m4_defi
> > ne(`__MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/greenplum/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> > 1.11/ports/hawq/2/madpack/SQLCommon.m4:20:m4_define(`__MADLI
> > B_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/hawq/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> > 1.11/ports/postgres/9.5/madpack/SQLCommon.m4:20:m4_define(`_
> > _MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/postgres/9.6/madpack/SQLCommon.m4:20:m4_define(`_
> > _MADLIB_GIT_REVISION__',
> > `rel/v1.10.0-31-gd54be2b')
> > 1.11/ports/postgres/modules/utilities/utilities.sql_in:122:'git
> > revision: __MADLIB_GIT_REVISION__, '
> > [root@ip-172-31-9-242 Versions]#
> >
> > On Wed, May 3, 2017 at 12:15 PM, Ed Espino  wrote:
> >
> > > I have taken a quick

Re: [DISCUSS] Graduation

2017-05-04 Thread Frank McQuillan
Thanks Roman.

I agree that this project is in the correct state to qualify as a TLP,
and would like to help move that forward.

In addition to the
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
that you mention, we also created a check list
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
which aims to describe where the project stands according the the Apache
project maturity model.

I would encourage members of the Apache MADlib community to take a look
at the check list and comment on any of the items there.

The project mgmt part of the wiki
https://cwiki.apache.org/confluence/display/MADLIB/Project+Management
also gives a pretty good snapshot of the project as it stands today.

Frank


On Thu, May 4, 2017 at 10:44 AM, Rahul Iyer  wrote:

> Hi Roman,
>
> Many thanks for your excellent mentorship!
>
> Your #2 and #3 proposals sound good to me and I look forward to the
> discussion on private@.
>
> - Rahul
>
>
> On Fri, Apr 28, 2017 at 10:47 AM, Roman Shaposhnik  wrote:
> > Hi!
> >
> > with the fifth (v1.11) release in the final stages of being cut,
> > I think now would be a good time to officially start our graduation
> > discussion. With my mentor hat on, I feel that the project is
> > mature and self-reliant enough to qualify as a TLP.
> >
> > Process-wise graduation consists of drafting a board resolution,
> > getting it approved by the IPMC and finally submitting it to the ASF
> > board's consideration. At the very minimum your resolution will contain:
> > 1. A name of the project (I assume that'll be MADlib)
> > 2. A list of proposed PMC members
> > 3. A proposed PMC chair
> > A good example of a resolution can be found here:
> > https://cwiki.apache.org/confluence/display/FINERACT/
> Graduation+Resolution
> >
> > In fact, Frank and I took the liberty to use that as the basis for our
> own:
> >  https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Please read it carefully and let us know what do you think.
> >
> > On #2 my suggestion would be to have an opt-in system. Basically
> > we will kick off the thread off on private@madlib asking current PPMC
> > members if they are willing to continue on the PMC.
> >
> > On #3 I typically recommend podlings I mentor to setup a rotating chair
> > policy. This is, in no way, an ASF requirement so feel free to ignore it,
> > but it worked well before. The chair will be expected up for rotation
> every
> > year. It will be more that ok for the same person to self-nominate once
> > the year is up -- but at the same time it'll be up to the same person to
> > actually kick off a thread asking if anybody else is interested in
> serving
> > as a chair for the next year. Of course, if there multiple candidates
> there
> > will have to be a vote.
> >
> > Speaking of self-nomination -- the same thread that we're going to kick
> > off as part of solving for #2 will ask for folks to self-nominate as an
> initial
> > chair to be listed on the resolution.
> >
> > Unless somebody objects strongly to my #2 and #3 proposals I'm going
> > to kick of this thread on private@.
> >
> > With that in mind, lets make the rest of the discussion on dev@ to be
> about
> > collecting the datapoints to present to IPCM as part of us asking them to
> > vote YES on our graduation. Lets collect all these data points in the
> same
> > wiki page:
> > https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Or if you feel that a discussion may be needed -- just reply to this
> thread.
> >
> > Thanks,
> > Roman.
>


Re: [DISCUSS] Graduation

2017-05-04 Thread Rahul Iyer
Hi Roman,

Many thanks for your excellent mentorship!

Your #2 and #3 proposals sound good to me and I look forward to the
discussion on private@.

- Rahul


On Fri, Apr 28, 2017 at 10:47 AM, Roman Shaposhnik  wrote:
> Hi!
>
> with the fifth (v1.11) release in the final stages of being cut,
> I think now would be a good time to officially start our graduation
> discussion. With my mentor hat on, I feel that the project is
> mature and self-reliant enough to qualify as a TLP.
>
> Process-wise graduation consists of drafting a board resolution,
> getting it approved by the IPMC and finally submitting it to the ASF
> board's consideration. At the very minimum your resolution will contain:
> 1. A name of the project (I assume that'll be MADlib)
> 2. A list of proposed PMC members
> 3. A proposed PMC chair
> A good example of a resolution can be found here:
> https://cwiki.apache.org/confluence/display/FINERACT/Graduation+Resolution
>
> In fact, Frank and I took the liberty to use that as the basis for our own:
>  https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
> Please read it carefully and let us know what do you think.
>
> On #2 my suggestion would be to have an opt-in system. Basically
> we will kick off the thread off on private@madlib asking current PPMC
> members if they are willing to continue on the PMC.
>
> On #3 I typically recommend podlings I mentor to setup a rotating chair
> policy. This is, in no way, an ASF requirement so feel free to ignore it,
> but it worked well before. The chair will be expected up for rotation every
> year. It will be more that ok for the same person to self-nominate once
> the year is up -- but at the same time it'll be up to the same person to
> actually kick off a thread asking if anybody else is interested in serving
> as a chair for the next year. Of course, if there multiple candidates there
> will have to be a vote.
>
> Speaking of self-nomination -- the same thread that we're going to kick
> off as part of solving for #2 will ask for folks to self-nominate as an 
> initial
> chair to be listed on the resolution.
>
> Unless somebody objects strongly to my #2 and #3 proposals I'm going
> to kick of this thread on private@.
>
> With that in mind, lets make the rest of the discussion on dev@ to be about
> collecting the datapoints to present to IPCM as part of us asking them to
> vote YES on our graduation. Lets collect all these data points in the same
> wiki page:
> https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
> Or if you feel that a discussion may be needed -- just reply to this thread.
>
> Thanks,
> Roman.