Re: Apache MADlib v1.12 status

2017-08-17 Thread Frank McQuillan
Here is the PR
https://github.com/apache/incubator-madlib/pull/169
Should be merged today

On Wed, Aug 16, 2017 at 10:20 AM, Ed Espino  wrote:

> Frankie,
>
> Are there Jiras for the remaining work? This work (minor changes to neural
> nets) is currently not visible on the release dashboard (
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12331450
> )
>
> -=e
>
> On Wed, Aug 16, 2017 at 9:52 AM, Frank McQuillan 
> wrote:
>
> > Some doc changes coming for multiple modules, and minor changes to neural
> > nets in the next day or so.
> >
> > Frank
> >
> > On Wed, Aug 16, 2017 at 9:49 AM, Cooper Sloan  wrote:
> >
> > > We shouldn't hold up the release.  This is a no-op for now.
> > > If we get more information from the customer we can reopen it, but for
> > now
> > > we will do nothing.
> > >
> > > CS
> > >
> > > On Wed, Aug 16, 2017 at 9:41 AM Ed Espino  wrote:
> > >
> > > > We have one outstanding Apache MADlib v1.12 Jira holding up the
> release
> > > > (MADLIB-1091). It appears Cooper has been working on it and is
> seeking
> > > > additional information. If it is not resolved soon, we need to decide
> > if
> > > we
> > > > will push this to a future release.
> > > >
> > > > FYI: Today I will be performing preliminary convenience binary builds
> > > > following the information provided in the Release Process section
> > titled
> > > > "Prepare rpm and dmg binaries" (
> > > >
> > > > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process#
> > > ReleaseProcess-Preparerpmanddmgbinaries
> > > > ).
> > > > I will undoubtedly be contributing additional information in the
> > section
> > > > and looking for guidance and confirmation of my understanding of the
> > > > convenience binary build environments.
> > > >
> > > > We're almost there!
> > > >
> > > > Cheerios,
> > > > -=e
> > > >
> > > > --
> > > > *Ed Espino*
> > > >
> > >
> >
>
>
>
> --
> *Ed Espino*
>


Re: Apache MADlib v1.12 status

2017-08-16 Thread Frank McQuillan
Some doc changes coming for multiple modules, and minor changes to neural
nets in the next day or so.

Frank

On Wed, Aug 16, 2017 at 9:49 AM, Cooper Sloan  wrote:

> We shouldn't hold up the release.  This is a no-op for now.
> If we get more information from the customer we can reopen it, but for now
> we will do nothing.
>
> CS
>
> On Wed, Aug 16, 2017 at 9:41 AM Ed Espino  wrote:
>
> > We have one outstanding Apache MADlib v1.12 Jira holding up the release
> > (MADLIB-1091). It appears Cooper has been working on it and is seeking
> > additional information. If it is not resolved soon, we need to decide if
> we
> > will push this to a future release.
> >
> > FYI: Today I will be performing preliminary convenience binary builds
> > following the information provided in the Release Process section titled
> > "Prepare rpm and dmg binaries" (
> >
> > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process#
> ReleaseProcess-Preparerpmanddmgbinaries
> > ).
> > I will undoubtedly be contributing additional information in the section
> > and looking for guidance and confirmation of my understanding of the
> > convenience binary build environments.
> >
> > We're almost there!
> >
> > Cheerios,
> > -=e
> >
> > --
> > *Ed Espino*
> >
>


Re: Apache MADlib v1.12 status

2017-08-14 Thread Frank McQuillan
Hi Ed,

We have not been able to reproduce
https://issues.apache.org/jira/browse/MADLIB-1091
so it may move out.

I still have some docs updates to do so that will be a coming PR probably
Tues or Wed.

Frank

On Mon, Aug 14, 2017 at 3:30 PM, Ed Espino  wrote:

> MADlib dev,
>
> We are winding down the number of outstanding issues for the Apache MADlib
> v1.12 release. The one outstanding issue is
> https://issues.apache.org/jira/browse/MADLIB-1091. Once this is resolved,
> I'm hoping to start the release process.
>
> Regards,
> -=e
>
> --
> *Ed Espino*
>


Re: JIRA for migrating repos following MADlib's TLP graduation

2017-08-14 Thread Frank McQuillan
LGTM

On Mon, Aug 14, 2017 at 4:32 PM, Nandish Jayaram 
wrote:

> Hi All,
>
> I have opened an Apache Infrastructure ticket to migrate MADlib's
> git repos, distribution server, and other common tasks associated
> with the move from incubator to TLP. The ticket is:
> https://issues.apache.org/jira/browse/INFRA-14872
>
> Please do have a look at it and let me know if I have missed something,
> or if something is to be changed. I followed the instructions at
> http://www.apache.org/dev/infra-contact#requesting-graduation to open
> the ticket, and used the template used by Apache Flex's TLP ticket
> https://issues.apache.org/jira/browse/INFRA-5688.
>
> I will keep you posted on the status of the ticket. We might still need to
> change some settings in MADlib's Jenkins build, once the git repo move
> is finished. I thought that was something we could control and might not
> need Infra's help for that (please correct me if I am wrong).
>
> NJ
>


Re: Jira post v1.12 version?

2017-08-14 Thread Frank McQuillan
Ed,

I would suggest v2.0 for the next version, so you can add those 2 JIRAs to
v2.0

Once we get v1.12 out the door I was going to solicit comments from the
community on v2.0 features so we can get that backlog going.

Frank

On Mon, Aug 14, 2017 at 11:30 AM, Ed Espino  wrote:

> Dev,
>
> What are we setting the Jira Fix Version/s for issues to be addressed in
> the next release (post v1.12)? I noticed a v2.0 version (06/Oct/17)
> available in Jira.
>
> The two issues I'd like to set to the next release are the following:
>
> https://issues.apache.org/jira/browse/MADLIB-1025 - MADlib does not
> compile
> with gcc 6.2
> https://issues.apache.org/jira/browse/MADLIB-1145 - Ubuntu 16.04 - Using
> GCC 5 (default gcc) causes Postgres 9.6 crash
>
> Any guidance is greatly appreciated.
>
> Regards
> -=e
>
> --
> *Ed Espino*
>


Re: [VOTE]: MADlib repo(s) migration

2017-08-14 Thread Frank McQuillan
1

On Fri, Aug 11, 2017 at 10:16 AM, Nandish Jayaram 
wrote:

> Hi All,
>
> A gentle reminder to vote if you'd like. I was thinking of opening the
> Apache Infra
> ticket for the move sometime today if there are no more votes to come.
>
> NJ
>
> On Thu, Aug 10, 2017 at 3:39 AM, ChenLiang Wang 
> wrote:
>
> > 1
> >
> > On 08/10/2017 05:47 AM, Orhan Kislal wrote:
> > > 1
> > >
> > > Orhan Kislal
> > >
> > > On Wed, Aug 9, 2017 at 2:32 PM, Nandish Jayaram 
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> With MADlib's graduation to TLP, it's time to migrate its github
> > >> repos from `*incubator-madlib*` to `*madlib*`. We will have to open
> > >> an Apache Infrastructure ticket to request this move for the following
> > >> repos (along with other stuff like wiki, jenkins etc):
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
> > >>  (Read/Write)
> > >> https://github.com/apache/incubator-madlib (Github mirror- read only)
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
> > >> https://github.com/apache/incubator-madlib-site (GitHub mirror)
> > >>
> > >> There are two ways to go about this, and the Infra ticket has to be
> > >> raised accordingly.
> > >> 1) Just maintain the current set-up, but have the repos renamed from
> > >> incubator-madlib to madlib.
> > >> 2) Use Gitbox to enable github repo as a R/W repo and not just
> > read-only.
> > >> Check this email (
> > >> https://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > >> dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaXvbT0KOmeFtQzp4eAa3p0fKPP7c
> > >> 8...@mail.gmail.com%3e)
> > >> for further information.
> > >>
> > >> Please vote you preference and we can decide to move accordingly.
> > >>
> > >> NJ
> > >>
> > >
> >
>


Re: MADlib Top Level Project Graduation

2017-07-28 Thread Frank McQuillan
I updated the front page of the web site with the TLP announcement
http://madlib.incubator.apache.org/

When I get back from vacation in 2 weeks, I will work with our mentor Roman
Shaposhnik to TLP-ify the rest of the infra and help with the press release

Frank

On Fri, Jul 28, 2017 at 2:53 PM, Frank McQuillan 
wrote:

> There will be a press release put out by the ASF, it is being written now
> but there have been some delays with people out on summer vacation.
>
> I will update the web site this afternoon with the news.  I was planning
> to wait for the press release, but think I will update it now and add a
> link to the press release later when  it comes out.
>
> Frank
>
> On Fri, Jul 28, 2017 at 2:24 PM, Ivan Novick  wrote:
>
>> I just tweeted it and referenced @joe_hellerstein
>> <https://twitter.com/joe_hellerstein>
>>
>> Let's make some noise.
>>
>> I am sure there will be more, but we can start on our own.
>>
>> On Fri, Jul 28, 2017 at 2:18 PM, Joseph Hellerstein <
>> hellerst...@berkeley.edu> wrote:
>>
>>> Is this public? Is anybody planning on putting a news item on a web page
>>> or
>>> something?
>>>
>>> Would be good to brag on social media, once that's in place.
>>>
>>> J
>>>
>>> On Wed, Jul 26, 2017 at 12:39 AM, Kazmi,Auon H  wrote:
>>>
>>> > Congrats developers!
>>> >
>>> > 
>>> > From: Woo Jung 
>>> > Sent: Monday, July 24, 2017 5:56:35 PM
>>> > To: u...@madlib.incubator.apache.org
>>> > Cc: dev@madlib.incubator.apache.org
>>> > Subject: Re: MADlib Top Level Project Graduation
>>> >
>>> > Congratulations MADlib -- Oh, the places you'll go! :)
>>> >
>>> > On Mon, Jul 24, 2017 at 2:29 PM, Greg Chase 
>>> wrote:
>>> >
>>> > > Congrats MADlib team! Very proud of you!
>>> > >
>>> > > On Mon, Jul 24, 2017 at 2:09 PM, Jarrod Vawdrey >> >
>>> > > wrote:
>>> > >
>>> > >> Awesome!!! Congrats team.
>>> > >>
>>> > >> Jarrod Vawdrey
>>> > >> (678) 651-0795
>>> > >>
>>> > >> > On Jul 24, 2017, at 3:57 PM, Joseph Hellerstein <
>>> > >> hellerst...@berkeley.edu> wrote:
>>> > >> >
>>> > >> > Very cool!
>>> > >> >
>>> > >> > On Mon, Jul 24, 2017 at 2:11 PM, Anirudh Kondaveeti <
>>> > >> akondave...@pivotal.io>
>>> > >> > wrote:
>>> > >> >
>>> > >> >> Congrats team!
>>> > >> >>
>>> > >> >>> On Mon, Jul 24, 2017 at 11:06 AM, Ivan Novick <
>>> inov...@pivotal.io>
>>> > >> wrote:
>>> > >> >>>
>>> > >> >>> nice work all!
>>> > >> >>>
>>> > >> >>> On Mon, Jul 24, 2017 at 11:04 AM, Marshall Presser <
>>> > >> mpres...@pivotal.io>
>>> > >> >>> wrote:
>>> > >> >>>
>>> > >> >>>> Woof, woof, woof!  Congrats to the team.
>>> > >> >>>> MEP
>>> > >> >>>>
>>> > >> >>>> On Mon, Jul 24, 2017 at 1:46 PM, FENG, Xixuan (Aaron) <
>>> > >> >>>> xixuan.f...@gmail.com> wrote:
>>> > >> >>>>
>>> > >> >>>>> Dear MADlib community,
>>> > >> >>>>>
>>> > >> >>>>> I am pleased to report that on July 19, the ASF board
>>> established
>>> > >> >>>>> Apache MADlib as a Top Level Project, which was approved by
>>> > >> unanimous
>>> > >> >>>>> vote of the directors present.
>>> > >> >>>>>
>>> > >> >>>>> MADlib entered incubation in the fall of 2015 and made 5
>>> releases
>>> > >> as an
>>> > >> >>>>> incubating project.  Along the way, we have worked hard to
>>> ensure
>>> > >> that
>>> > >> >> the
>>> > >> >>>>> project is being developed according to the principles of the
>>> > Apache
>>> > >> >> Way.
>>> > >> >>>>> We will continue to do so in the future as a TLP,  to the
>>> best of
>>> > >> our
>>> > >> >>>>> ability.
>>> > >> >>>>>
>>> > >> >>>>> Thank you to all of you for your contributions to the project,
>>> > and I
>>> > >> >>>>> look forward to working with you as part of this community!
>>> > >> >>>>>
>>> > >> >>>>> Aaron Feng
>>> > >> >>>>> Vice President, Apache MADlib
>>> > >> >>>>>
>>> > >> >>>>
>>> > >> >>>>
>>> > >> >>>>
>>> > >> >>>> --
>>> > >> >>>> Marshall Presser
>>> > >> >>>> Pivotal Data Engineering
>>> > >> >>>> mpresser@pivotal .io
>>> > >> >>>> 240.401.1750 <(240)%20401-1750>
>>> > >> >>>>
>>> > >> >>>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>> --
>>> > >> >>> Ivan Novick, Product Manager Pivotal Greenplum
>>> > >> >>> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491>
>>> > >> >>> https://www.youtube.com/GreenplumDatabase
>>> > >> >>>
>>> > >> >>>
>>> > >> >>
>>> > >> >>
>>> > >> >> --
>>> > >> >> Anirudh Kondaveeti, Ph.D. | Principal Data Scientist | Pivotal
>>> Data
>>> > >> Science
>>> > >> >> Team akondave...@pivotal.io | c - 650 483 3985
>>> > >> >>
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> Ivan Novick, Product Manager Pivotal Greenplum
>> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491> --
>> (Skype) 512-782-9555 <(512)%20782-9555>
>> https://www.youtube.com/GreenplumDatabase
>>
>>
>


Re: MADlib Top Level Project Graduation

2017-07-28 Thread Frank McQuillan
There will be a press release put out by the ASF, it is being written now
but there have been some delays with people out on summer vacation.

I will update the web site this afternoon with the news.  I was planning to
wait for the press release, but think I will update it now and add a link
to the press release later when  it comes out.

Frank

On Fri, Jul 28, 2017 at 2:24 PM, Ivan Novick  wrote:

> I just tweeted it and referenced @joe_hellerstein
> 
>
> Let's make some noise.
>
> I am sure there will be more, but we can start on our own.
>
> On Fri, Jul 28, 2017 at 2:18 PM, Joseph Hellerstein <
> hellerst...@berkeley.edu> wrote:
>
>> Is this public? Is anybody planning on putting a news item on a web page
>> or
>> something?
>>
>> Would be good to brag on social media, once that's in place.
>>
>> J
>>
>> On Wed, Jul 26, 2017 at 12:39 AM, Kazmi,Auon H  wrote:
>>
>> > Congrats developers!
>> >
>> > 
>> > From: Woo Jung 
>> > Sent: Monday, July 24, 2017 5:56:35 PM
>> > To: u...@madlib.incubator.apache.org
>> > Cc: dev@madlib.incubator.apache.org
>> > Subject: Re: MADlib Top Level Project Graduation
>> >
>> > Congratulations MADlib -- Oh, the places you'll go! :)
>> >
>> > On Mon, Jul 24, 2017 at 2:29 PM, Greg Chase  wrote:
>> >
>> > > Congrats MADlib team! Very proud of you!
>> > >
>> > > On Mon, Jul 24, 2017 at 2:09 PM, Jarrod Vawdrey 
>> > > wrote:
>> > >
>> > >> Awesome!!! Congrats team.
>> > >>
>> > >> Jarrod Vawdrey
>> > >> (678) 651-0795
>> > >>
>> > >> > On Jul 24, 2017, at 3:57 PM, Joseph Hellerstein <
>> > >> hellerst...@berkeley.edu> wrote:
>> > >> >
>> > >> > Very cool!
>> > >> >
>> > >> > On Mon, Jul 24, 2017 at 2:11 PM, Anirudh Kondaveeti <
>> > >> akondave...@pivotal.io>
>> > >> > wrote:
>> > >> >
>> > >> >> Congrats team!
>> > >> >>
>> > >> >>> On Mon, Jul 24, 2017 at 11:06 AM, Ivan Novick <
>> inov...@pivotal.io>
>> > >> wrote:
>> > >> >>>
>> > >> >>> nice work all!
>> > >> >>>
>> > >> >>> On Mon, Jul 24, 2017 at 11:04 AM, Marshall Presser <
>> > >> mpres...@pivotal.io>
>> > >> >>> wrote:
>> > >> >>>
>> > >>  Woof, woof, woof!  Congrats to the team.
>> > >>  MEP
>> > >> 
>> > >>  On Mon, Jul 24, 2017 at 1:46 PM, FENG, Xixuan (Aaron) <
>> > >>  xixuan.f...@gmail.com> wrote:
>> > >> 
>> > >> > Dear MADlib community,
>> > >> >
>> > >> > I am pleased to report that on July 19, the ASF board
>> established
>> > >> > Apache MADlib as a Top Level Project, which was approved by
>> > >> unanimous
>> > >> > vote of the directors present.
>> > >> >
>> > >> > MADlib entered incubation in the fall of 2015 and made 5
>> releases
>> > >> as an
>> > >> > incubating project.  Along the way, we have worked hard to
>> ensure
>> > >> that
>> > >> >> the
>> > >> > project is being developed according to the principles of the
>> > Apache
>> > >> >> Way.
>> > >> > We will continue to do so in the future as a TLP,  to the best
>> of
>> > >> our
>> > >> > ability.
>> > >> >
>> > >> > Thank you to all of you for your contributions to the project,
>> > and I
>> > >> > look forward to working with you as part of this community!
>> > >> >
>> > >> > Aaron Feng
>> > >> > Vice President, Apache MADlib
>> > >> >
>> > >> 
>> > >> 
>> > >> 
>> > >>  --
>> > >>  Marshall Presser
>> > >>  Pivotal Data Engineering
>> > >>  mpresser@pivotal .io
>> > >>  240.401.1750 <(240)%20401-1750>
>> > >> 
>> > >> 
>> > >> >>>
>> > >> >>>
>> > >> >>> --
>> > >> >>> Ivan Novick, Product Manager Pivotal Greenplum
>> > >> >>> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491>
>> > >> >>> https://www.youtube.com/GreenplumDatabase
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Anirudh Kondaveeti, Ph.D. | Principal Data Scientist | Pivotal
>> Data
>> > >> Science
>> > >> >> Team akondave...@pivotal.io | c - 650 483 3985
>> > >> >>
>> > >>
>> > >
>> > >
>> >
>>
>
>
>
> --
> Ivan Novick, Product Manager Pivotal Greenplum
> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491> --
> (Skype) 512-782-9555 <(512)%20782-9555>
> https://www.youtube.com/GreenplumDatabase
>
>


Re: External references to MADlib incubator project content

2017-07-13 Thread Frank McQuillan
Yes it will re-direct to the TLP location.

On Thu, Jul 13, 2017 at 3:42 PM, Ed Espino  wrote:

> When MADlib graduates, will the previous incubator links redirect to the
> TLP location?  I noticed the following MADlib incubator references in the
> Pivotal Greenplum DB docs::
>
> source page:
> Greenplum MADlib Extension for Analytics
> https://gpdb.docs.pivotal.io/4390/ref_guide/extensions/madlib.html#topic9
>
> link references:
>   MADlib web site is at http://madlib.incubator.apache.org/
>   MADlib documentation is at
> http://madlib.incubator.apache.org/documentation.html
>
> -=e
> --
> *Ed Espino*
>


Re: Apache Jira: MADLIB v1.12-incubating and Metrics dashboard

2017-07-11 Thread Frank McQuillan
ll.sh
> 198:echo "Release notes and additional documentation can be found at
> http://madlib.incubator.apache.org/";
>
> deploy/PackageMaker/Welcome.html
> 6:Welcome to Apache MADlib (incubating)
> 8:Welcome to Apache MADlib (incubating)
> 14:Apache MADlib is an effort undergoing incubation at the Apache
> Software
> 15:Foundation (ASF), sponsored by the Apache Incubator PMC.
> 17:Incubation is required of all newly accepted projects until a further
> 22:While incubation status is not necessarily a reflection of the
>
> deploy/PGXN/META.json.in
> 6:"maintainer": "MADlib contributors  >",
> 16:"homepage": "http://madlib.incubator.apache.org/";,
> 21:"url":  "https://github.com/apache/incubator-madlib.git";,
> 22:"web":  "https://github.com/apache/incubator-madlib";,
>
> deploy/PGXN/ReadMe.txt
> 1:Apache MADlib (incubating) Read Me
> 8:See the project web site located at http://madlib.incubator.apache.org/
> for
> 14:The latest documentation of MADlib modules can be found at
> http://madlib.incubator.apache.org/docs
> 27:
> https://github.com/apache/incubator-madlib/blob/master/
> licenses/third_party/_M_widen_init.txt
> 65:Apache MADlib is an effort undergoing incubation at the Apache Software
> 66:Foundation (ASF), sponsored by the Apache Incubator PMC.
> 68:Incubation is required of all newly accepted projects until a further
> 73:While incubation status is not necessarily a reflection of the
>
> doc/etc/developer.doxyfile.in
> 843:USE_MDFILE_AS_MAINPAGE = "
> https://github.com/apache/incubator-madlib/blob/master/README.md";
>
> doc/etc/header.html
> 30:  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
> 44:  http://madlib.incubator.apache.org
> "> alt="Logo" src="$relpath^$projectlogo" height="50"
> style="padding-left:0.5em;" border="0"/ >
>
> doc/mainpage.dox.in
> 3:Apache MADlib (incubating) is an open-source library for scalable
> 14:http://madlib.incubator.apache.org";>MADlib web
> site
> 17:https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/";>User
> mailing list
> 18:https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/";>Dev
> mailing list
> 35:https://github.com/apache/incubator-madlib/blob/
> master/README.md
> ">ReadMe
> 38:https://github.com/apache/incubator-madlib/blob/master/LICENSE
> ">
>
> src/madpack/madpack.py
> 698:<
> http://madlib.incubator.apache.org/docs/latest/group__
> grp__linreg.html#warning
> >
>
> tool/docker/base/Dockerfile_gpdb_4_3_10
> 34:#ADD ./ /incubator-madlib
> 35:##RUN cd incubator-madlib && \
> 50:## 1) docker run -d -it --name gpdb -v
> (path-to-incubator-madlib)/src:/incubator-madlib/src gpdb bash
> 53:## 2) docker exec -it gpdb /incubator-madlib/build/src/bin/madpack -p
> greenplum -c gpadmin@127.0.0.1:5432/gpadmin install
> 59:## - cd /incubator-madlib/build
> 60:## - make (This can be run after changing code in the incubator-madlib
> source code)
>
> tool/docker/base/Dockerfile_postgres_9_6
> 56:## To build an image from this docker file, from incubator-madlib
> folder, run:
>
> tool/docker/base/Dockerfile_postgres_9_6_Jenkins
> 41:## To build an image from this docker file, from incubator-madlib
> folder, run:
>
> tool/jenkins/jenkins_build.sh
> 48:docker run -d --name madlib -v
> "${workdir}/incubator-madlib":/incubator-madlib
> madlib/postgres_9.6:jenkins
> | tee logs/docker_setup.log
> 50:docker run -d --name madlib -v
> "${workdir}/incubator-madlib":/incubator-madlib
> madlib/postgres_9.6:jenkins
> | tee logs/docker_setup.log
> 60:docker exec madlib bash -c 'rm -rf /build; mkdir /build; cd /build;
> cmake ../incubator-madlib; make clean; make; make install; make package' |
> tee $workdir/logs/madlib_compile.log
> 62:docker exec madlib bash -c 'rm -rf /build; mkdir /build; cd /build;
> cmake ../incubator-madlib; make clean; make; make install; make package' |
> tee $workdir/logs/madlib_compile.log
> 95:python incubator-madlib/tool/jenkins/junit_export.py
> $workdir/logs/madlib_install_check.log
> $workdir/logs/madlib_install_check.xml
> 97:python incubator-madlib/tool/jenkins/junit_export.py $workdir
> $workdir/logs/madlib_install_check.log
> $workdir/logs/madlib_install_check.xml
>
> tool/jenkins/rat_check.sh
> 27:grep "Copyright 2016-$(date +"%Y") The Apache Software Foundation"
> "${workdir}/incubator-madlib/NOTICE"
> 32:grep &quo

Re: Apache Jira: MADLIB v1.12-incubating and Metrics dashboard

2017-07-10 Thread Frank McQuillan
Thanks Ed, those dashboards are useful and give a good view of things.

Regarding the 1.12 release timing, I suggest we move the release date until
after the next ASF board meeting, which is scheduled for July 19, 2017. The
reason is that MADlib graduation is on the agenda for the ASF meeting and
hopefully it will pass fine.  So I suggest the new release date for 1.12 is
Aug 4, a couple weeks or so later.  I updated the release date in JIRA.

And yes, there is quite a lot of history on this project as it has been
around since 2011 or so, well before the move the ASF in the fall of 2015.

Frank




On Mon, Jul 10, 2017 at 1:58 PM, Ed Espino  wrote:

> The automated Jira report for MADLIB Version v1.12 (UNRELEASED) is also
> useful for getting a very quick view of the release status. It also
> respects the tentative release date (14/Jul/17).
>
> https://issues.apache.org/jira/projects/MADLIB/versions/12340360
>
> -=e
>
> On Mon, Jul 10, 2017 at 1:04 PM, Ed Espino  wrote:
>
> > MADlibers,
> >
> > FYI: In order to get my head wrapped around the current Apache Jira state
> > for the MADlib v1.12 release, I have thrown together a quick dashboard.
> I
> > have made the dashboard and corresponding filters publicly available.
> This
> > will help me/us monitor the release convergence.
> >
> > Apache MADlib v1.12-incubating Release Dashboard:
> > https://issues.apache.org/jira/secure/Dashboard.jspa?
> selectPageId=12331450
> >
> > Additionally, to get a status of the overall Jira state, I also threw
> > together a quick MADlib metrics dashboard. It appears there is a bit of
> > Jira legacy history with the project. :
> > https://issues.apache.org/jira/secure/Dashboard.jspa?
> selectPageId=12331451
> >
> > Please take a quick look and let me know what you think. I can easily
> > adjust the dashboards if needed.
> >
> > Regards,
> > -=e
> >
> >
> > --
> > *Ed Espino*
> >
>
>
>
> --
> *Ed Espino*
>


Re: MADlib Q2 report to ASF

2017-07-10 Thread Frank McQuillan
Thanks for the suggestion Roman.  I updated the report with this additional
information.

Frank

On Mon, Jul 10, 2017 at 2:27 PM, Roman Shaposhnik 
wrote:

> Looks good, but I'd also add (if not too late) that the resolution for
> graduation was tabled by the board last month and is now being
> re-submitted
>
> On Thu, Jul 6, 2017 at 1:39 AM, Frank McQuillan 
> wrote:
> > Here is the draft ASF report for July 2017, covering Q2 2017 activity.
> >
> > It is posted at http://wiki.apache.org/incubator/July2017
> >
> > Please let me know if you have any comments or suggestions and I will
> > update the report.
> >
> > ---
> >
> > MADlib
> >
> > Big Data Machine Learning in SQL for Data Scientists.
> >
> > MADlib has been incubating since 2015-09-15.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. Finalize trademark transfer from Pivotal to ASF.
> >   2. Continue to produce regular Apache (incubating) releases.
> >   3. Continue to execute and manage the project according to governance
> > model of the "Apache Way”.
> >
> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware
> > of?
> >
> >   1. The Apache MADlib Project is ready for graduation out of the
> > incubator.
> > Discussion by Project:
> > https://lists.apache.org/thread.html/070c6764fcd0448b2db8975936b52f
> 7a28bd0e231c0e690288a6968e@%3Cdev.madlib.apache.org%3E
> > Vote by IPMC and community:
> > https://lists.apache.org/thread.html/733920464e8f8170d9cc831b701f27
> 5d757ee9448a7bfd05a1bf8dfd@%3Cgeneral.incubator.apache.org%3E
> > Trademark transfer from Pivotal to ASF is being tracked in:
> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125
> >
> > How has the community developed since the last report?
> >
> >   1. Some related events in Q2 2017:
> >  * May 25, 2017 - MADlib community call.  Topic:  New Features in
> > Apache MADlib 1.11 (Frank McQuillan)
> >  * Jun 21, 2017 - Greenplum meetup in San Francisco.  Topic:  Apache
> > Solr & MADlib (incubating): Enabling Massive Text Analytics In-Database
> > (Bharath Sitaraman)
> >  * Jul 5-7, 2017 - PG Day Russia.  Topic: Various on “Greenplum Day”
> > Jul 5 including in-database analyitics (Roman Shaposhnik and others)
> >  * Jul 25, 2017 (upcoming) - SF Bay ACM Chapter meetup.  Topic:
> >  Advanced Analytics for Security: Lateral Movement Detection (Anirudh
> > Kondaveti)
> >
> >   2. See material technical conversations on user/dev mailing lists and
> in
> > the appropriate JIRAs and pull requests.
> >
> > How has the project developed since the last report?
> >
> >   1. TLP readiness - maturity evaluation matrix
> > https://cwiki.apache.org/confluence/display/MADLIB/ASF+
> Maturity+Evaluation
> >   2. TLP readiness - graduation resolution
> > https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
> >   3. TLP readiness - documented release process
> > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
> >   4. Active work in progress for 6th ASF release MADlib v1.12 scheduled
> for
> > Jul/Aug 2017.  Features include: more graph analytics (weakly connected
> > components, breadth first search, all pairs shortest path, multiple graph
> > measures), neural nets, stratified sampling, train-test split,
> improvements
> > to decision tree & random forest, improvements to summary function
> >   5. Mailing list activity in Q2:  295 postings to dev, 77 postings to
> user.
> >
> > How would you assess the podling's maturity?
> > Please feel free to add your own commentary.
> >
> >   [ ] Initial setup
> >   [ ] Working towards first release
> >   [ ] Community building
> >   [X] Nearing graduation
> >   [ ] Other:
> >
> > Date of last release:
> >
> >   MADlib v1.11 on 5/16/17.
> >
> > When were the last committers or PMC members elected:
> >
> >   Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.
>


Re: Volunteer: Apache MADlib 1.12 (incubating) release manager

2017-07-07 Thread Frank McQuillan
Hi Ed,

Thank you kindly for your offer to be release manager for 1.12!
We heartily accept your offer!

And it is great that you have experience
on HAWQ - I think the MADlib release process will be very similar to
what you are used to.

We have put together a wiki page on the MADlib release process
https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
so you can have a look there and see the steps. Hopefully no surprises.

We are looking at releasing 1.12 within the next month, depending on
community wishes, and we will be happy to work thru the steps with you.

Again, thanks again for the offer, and we'll talk soon!

Frank



On Fri, Jul 7, 2017 at 12:49 PM, Trevor Grant 
wrote:

> ... that's a very nice gesture-
>
> I'm only a lurker on this mailing list but I'm a PMC on a couple of other
> projects- would be happy to take you up if these folks don't :D
>
>
>
> On Fri, Jul 7, 2017 at 2:42 PM, Ed Espino  wrote:
>
> > MADlib dev,
> >
> > I'm not sure if one has been identified and even though I am not a
> > committer on the project, I would like to volunteer my services to be the
> > release manager for the upcoming Apache MADlib 1.12 (incubating). I have
> > served in this capacity for the Apache HAWQ 2.1.0.0-incubating release
> > (references below). I have had the chance to review several of the
> > previous MADlib releases. I am looking forward to hone my ASF skill set
> and
> > this looks like a very good opportunity.
> >
> > Regards,
> > -=e
> >
> > My release manager participation references:
> > Apache HAWQ 2.1.0.0-incubating dev voting thread:
> > https://lists.apache.org/thread.html/9d3025c12dc032437d1317d662f0e4
> > 434754c00258ca1abdd5c0ab9f@%3Cdev.hawq.apache.org%3E
> >
> > Apache HAWQ 2.1.0.0-incubating IPMC voting thread:
> > https://lists.apache.org/thread.html/1636e892b95475fe0af130d83fa457
> > c3e8bfa0d26f695f6faac0@%3Cgeneral.incubator.apache.org%3E
> >
> > --
> > *Ed Espino*
> >
>


MADlib Q2 report to ASF

2017-07-05 Thread Frank McQuillan
Here is the draft ASF report for July 2017, covering Q2 2017 activity.

It is posted at http://wiki.apache.org/incubator/July2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Finalize trademark transfer from Pivotal to ASF.
  2. Continue to produce regular Apache (incubating) releases.
  3. Continue to execute and manage the project according to governance
model of the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

  1. The Apache MADlib Project is ready for graduation out of the
incubator.
Discussion by Project:
https://lists.apache.org/thread.html/070c6764fcd0448b2db8975936b52f7a28bd0e231c0e690288a6968e@%3Cdev.madlib.apache.org%3E
Vote by IPMC and community:
https://lists.apache.org/thread.html/733920464e8f8170d9cc831b701f275d757ee9448a7bfd05a1bf8dfd@%3Cgeneral.incubator.apache.org%3E
Trademark transfer from Pivotal to ASF is being tracked in:
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125

How has the community developed since the last report?

  1. Some related events in Q2 2017:
 * May 25, 2017 - MADlib community call.  Topic:  New Features in
Apache MADlib 1.11 (Frank McQuillan)
 * Jun 21, 2017 - Greenplum meetup in San Francisco.  Topic:  Apache
Solr & MADlib (incubating): Enabling Massive Text Analytics In-Database
(Bharath Sitaraman)
 * Jul 5-7, 2017 - PG Day Russia.  Topic: Various on “Greenplum Day”
Jul 5 including in-database analyitics (Roman Shaposhnik and others)
 * Jul 25, 2017 (upcoming) - SF Bay ACM Chapter meetup.  Topic:
 Advanced Analytics for Security: Lateral Movement Detection (Anirudh
Kondaveti)

  2. See material technical conversations on user/dev mailing lists and in
the appropriate JIRAs and pull requests.

How has the project developed since the last report?

  1. TLP readiness - maturity evaluation matrix
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
  2. TLP readiness - graduation resolution
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
  3. TLP readiness - documented release process
https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
  4. Active work in progress for 6th ASF release MADlib v1.12 scheduled for
Jul/Aug 2017.  Features include: more graph analytics (weakly connected
components, breadth first search, all pairs shortest path, multiple graph
measures), neural nets, stratified sampling, train-test split, improvements
to decision tree & random forest, improvements to summary function
  5. Mailing list activity in Q2:  295 postings to dev, 77 postings to user.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

  [ ] Initial setup
  [ ] Working towards first release
  [ ] Community building
  [X] Nearing graduation
  [ ] Other:

Date of last release:

  MADlib v1.11 on 5/16/17.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Update on graduation to TLP

2017-06-26 Thread Frank McQuillan
Hello MADlib community,

As you may know, the MADlib project is proceeding towards TLP status
and was on the agenda at the 6/21/17 ASF board meeting. At that meeting
the board tabled (postponed) voting on the MADlib graduation resolution.

Mark and John (copied) from the ASF sent the MADlib PMC
some information regarding the postponement, and I would like to
briefly summarize the current status with you:

If I may quote Mark directly:

“The MADlib graduation resolution triggered some discussion on board@
that started when it was pointed out that the registered MADlib marks
had not yet been transferred to the ASF. During that discussion, IPMC
members expressed differing views on whether MADlib should graduate or
not because of the missing trademark assignment.”

I know that the current owner of the MADlib trademark (Pivotal)
is currently working with the ASF to transfer the trademarks,
and we are hoping that this can be achieved with minimum delay.
Here is the related JIRA
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125

Also, Mark also took the time to review the MADlib archives,
to see if there were any indications that MADlib would benefit from
remaining longer in the incubator. This includes the licensing
described here
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance

He reported:

“I’ve looked through the archives and nothing jumped out at me as
problematic.”
&
“I don't see anything in the licensing situation that should prevent
graduation.”

Hopefully I have represented the current status correctly, but I
have copied Mark and John on this thread in the case they would like
to add/correct anything.

We will keep you posted as we learn more.  In the interim, we will
continue to work hard on the upcoming 1.12 release and look forward to
more great tech coming out of this project.

Frank


Candidate 1.12 JIRAs

2017-06-05 Thread Frank McQuillan
Hello,

There is a pretty healthy list of candidate 1.12 JIRAs:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.12%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

If anyone in the community would like to contribute to any of these JIRAs,
or has any suggestions, comments or additions/subtractions, it would be
great to hear your thoughts.

Frank


1.11 release presentation

2017-05-25 Thread Frank McQuillan
https://drive.google.com/open?id=0B62dTQMossK9STNETUR0aGZkRFE

>From recent MADlib community call.   Video will be posted shortly.

Frank


Re: Announcing MADlib v1.11 GA

2017-05-17 Thread Frank McQuillan
Thank you Rashmi for being release manager for 1.11

Just a reminder to the MADlib community that examples of most of the new
features in 1.11 are included in the Juypyter notebooks posted at
https://github.com/apache/incubator-madlib-site/tree/asf-site/community-artifacts
Look for the most recent notebooks.

Looking forward to hearing what the community is interested in for 1.12
release.  Please share your ideas here or in JIRA.

Frank






On Tue, May 16, 2017 at 3:49 PM, Rashmi Raghu  wrote:

> MADlib v1.11 is now generally available.
>
> The vote was PASSED by Incubator PMC members:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201705.mbox/%
> 3CCAMtNjok4BJaSzG=yfkqcdfnqrrvedeomuf2jvxz6giatg85...@mail.gmail.com%3E
>
> The source and binaries are posted at:
> https://dist.apache.org/repos/dist/release/incubator/madlib/
> 1.11-incubating/
>
> Release notes:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
>
> User documentation:
> http://madlib.incubator.apache.org/docs/latest/
>
> We look forward to community participation for the next release v1.12 and
> moving towards TLP graduation!
>
> Regards,
> Rashmi Raghu
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


Re: Progress towards graduation

2017-05-16 Thread Frank McQuillan
Roman,

I had another read thru the graduation resolution
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
and maturity evaluation
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation

Both look pretty complete to me at this point.   Other parts of the wiki
have a lot of
info about the project and getting started and such, so looks comprehensive
enough for a project at this level of maturity.

Frank



On Mon, May 15, 2017 at 11:26 AM, Roman Shaposhnik  wrote:

> Hi!
>
> based on the discussion that happened on privated@madlib
> I've updated the PMC roster and a PMC Chair position:
> https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
>
> I'd like to ask everyone to read through this doc once more
> and let me know if something is missing.
>
> Also, I need all those who are currently committers on
> MADlib but are NOT on the PMC to reply to this thread
> if they wish to keep being committers on the project.
>
> Thanks,
> Roman.
>


Re: Installation issue - OSError: [Errno 2] No such file or directory: '/usr/local/madlib/Versions/1.10.0/ports/postgres'

2017-05-09 Thread Frank McQuillan
Oh, thanks Orhan.  It was a good idea to update the rpm.

On Tue, May 9, 2017 at 2:08 PM, Orhan Kislal  wrote:

> Hi MADlib community,
>
> I sincerely apologize for the error. The RPM for MADlib v1.10.0 and its
> signatures have been updated. Note that the new features/improvements of
> v1.11 will not be available with this file (feel free to check the v1.11
> RPM Rahul and Frank mentioned for them). Please let us know if you have any
> questions.
>
> Thanks,
>
> Orhan Kislal
>
>
> On Tue, May 9, 2017 at 9:47 AM, Frank McQuillan 
> wrote:
>
> > Apologies on the RPM error, Atsushi.
> >
> > The 1.11 release candidate is posted now
> > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> > 11-incubating-rc3/
> > so you could use that RPM.
> >
> > As Rahul mentioned, we are in the process of voting on the 1.11 release
> > currently.  However, I don't expect it to change.  Official release will
> > hopefully happen later this week.
> >
> > Frank
> >
> >
> >
> > On Tue, May 9, 2017 at 9:13 AM, Rahul Iyer  wrote:
> >
> >> +dev for the problem with RPM
> >>
> >> Hi Atsushi,
> >>
> >> Thanks for bringing this to our notice!
> >>
> >> We might have to remove the 1.10 binary from the Apache dist to avoid
> >> others from having this problem. We're in the process of releasing 1.11
> >> and
> >> would redirect to that binary once that goes through the voting process.
> >>
> >> @Louis, could you please clear the `/usr/local/madlib` folder and try
> >> again
> >> with pgxn (or compiling from source as suggested by Markus)?
> >>
> >>
> >>
> >>
> >> On Mon, May 8, 2017 at 11:53 PM, Neki, Atsushi <
> >> neki.atsu...@jp.fujitsu.com>
> >> wrote:
> >>
> >> > Hi Louis, Rahul,
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > It seems that the installation using RPM binary doesn’t work for
> 1.10.0.
> >> >
> >> > The RPM doesn’t have anything but hawq under ports directory.
> >> >
> >> >
> >> >
> >> > $ rpm -qlpi ./apache-madlib-1.10.0-incubating-bin-Linux.rpm | grep
> >> ports
> >> >
> >> >
> >> >
> >> > /usr/local/madlib/Versions/1.10.0/ports/hawq
> >> >
> >> >   (snip)
> >> >
> >> >
> >> >
> >> > For 1.9.1 RPM binary, it doesn’t look so.
> >> >
> >> >
> >> >
> >> > /usr/local/madlib/Versions/1.9.1/ports/greenplum
> >> >
> >> > /usr/local/madlib/Versions/1.9.1/ports/hawq
> >> >
> >> > /usr/local/madlib/Versions/1.9.1/ports/postgres
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Unfortunately, I have no idea about the problem with pgxn.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Atsushi Neki
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > *From:* Markus Paaso [mailto:markus.pa...@gmail.com]
> >> > *Sent:* Saturday, May 6, 2017 2:02 PM
> >> > *To:* u...@madlib.incubator.apache.org
> >> > *Subject:* Re: Installation issue - OSError: [Errno 2] No such file or
> >>
> >> > directory: '/usr/local/madlib/Versions/1.10.0/ports/postgres'
> >> >
> >> >
> >> >
> >> > Hi Louis,
> >> >
> >> >
> >> >
> >> > I have installed madlib on Ubuntu 16.04 using following commands:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > PSQL_HOST="127.0.0.1"
> >> >
> >> > PSQL_DB="testing"
> >> >
> >> > PSQL_USER="testuser"
> >> >
> >> > PSQL_PASS=""
> >> >
> >> >
> >> >
> >> > psql -h $PSQL_HOST template1 -c "CREATE ROLE $PSQL_USER PASSWORD
> >> > '$PSQL_PASS'"
> >> >
> >> > createdb -h $PSQL_HOST $PSQL_DB -O $PSQL_USER
> >> >
> >> >
> >> >
> >> > sudo apt install -y cmake m4
> >> >
> >> > wget https://github.com/apache/incubator-madlib/archive/rel/v1.
> >> 10.

Re: Installation issue - OSError: [Errno 2] No such file or directory: '/usr/local/madlib/Versions/1.10.0/ports/postgres'

2017-05-09 Thread Frank McQuillan
Apologies on the RPM error, Atsushi.

The 1.11 release candidate is posted now
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.11-incubating-rc3/
so you could use that RPM.

As Rahul mentioned, we are in the process of voting on the 1.11 release
currently.  However, I don't expect it to change.  Official release will
hopefully happen later this week.

Frank



On Tue, May 9, 2017 at 9:13 AM, Rahul Iyer  wrote:

> +dev for the problem with RPM
>
> Hi Atsushi,
>
> Thanks for bringing this to our notice!
>
> We might have to remove the 1.10 binary from the Apache dist to avoid
> others from having this problem. We're in the process of releasing 1.11 and
> would redirect to that binary once that goes through the voting process.
>
> @Louis, could you please clear the `/usr/local/madlib` folder and try again
> with pgxn (or compiling from source as suggested by Markus)?
>
>
>
>
> On Mon, May 8, 2017 at 11:53 PM, Neki, Atsushi <
> neki.atsu...@jp.fujitsu.com>
> wrote:
>
> > Hi Louis, Rahul,
> >
> >
> >
> >
> >
> > It seems that the installation using RPM binary doesn’t work for 1.10.0.
> >
> > The RPM doesn’t have anything but hawq under ports directory.
> >
> >
> >
> > $ rpm -qlpi ./apache-madlib-1.10.0-incubating-bin-Linux.rpm | grep ports
> >
> >
> >
> > /usr/local/madlib/Versions/1.10.0/ports/hawq
> >
> >   (snip)
> >
> >
> >
> > For 1.9.1 RPM binary, it doesn’t look so.
> >
> >
> >
> > /usr/local/madlib/Versions/1.9.1/ports/greenplum
> >
> > /usr/local/madlib/Versions/1.9.1/ports/hawq
> >
> > /usr/local/madlib/Versions/1.9.1/ports/postgres
> >
> >
> >
> >
> >
> > Unfortunately, I have no idea about the problem with pgxn.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Atsushi Neki
> >
> >
> >
> >
> >
> > *From:* Markus Paaso [mailto:markus.pa...@gmail.com]
> > *Sent:* Saturday, May 6, 2017 2:02 PM
> > *To:* u...@madlib.incubator.apache.org
> > *Subject:* Re: Installation issue - OSError: [Errno 2] No such file or
> > directory: '/usr/local/madlib/Versions/1.10.0/ports/postgres'
> >
> >
> >
> > Hi Louis,
> >
> >
> >
> > I have installed madlib on Ubuntu 16.04 using following commands:
> >
> >
> >
> >
> >
> > PSQL_HOST="127.0.0.1"
> >
> > PSQL_DB="testing"
> >
> > PSQL_USER="testuser"
> >
> > PSQL_PASS=""
> >
> >
> >
> > psql -h $PSQL_HOST template1 -c "CREATE ROLE $PSQL_USER PASSWORD
> > '$PSQL_PASS'"
> >
> > createdb -h $PSQL_HOST $PSQL_DB -O $PSQL_USER
> >
> >
> >
> > sudo apt install -y cmake m4
> >
> > wget https://github.com/apache/incubator-madlib/archive/rel/
> v1.10.0.tar.gz
> >
> > tar -xzf v1.10.0.tar.gz
> >
> > cd incubator-madlib-rel-v1.10.0
> >
> > ./configure
> >
> > cd build
> >
> > make
> >
> > sudo make install
> >
> >
> >
> > MADLIB_USER="mad"
> >
> > MADLIB_PASS="$(openssl rand -base64 32)"
> >
> > psql -h $PSQL_HOST $PSQL_DB -c "CREATE USER $MADLIB_USER SUPERUSER
> > PASSWORD '$MADLIB_PASS'"
> >
> > PGPASSWORD="$MADLIB_PASS" /usr/local/madlib/bin/madpack -p postgres -c
> > $MADLIB_USER@$PSQL_HOST/$PSQL_DB install
> >
> >
> >
> > psql -h $PSQL_HOST $PSQL_DB -c "GRANT ALL PRIVILEGES ON SCHEMA madlib TO
> > $PSQL_USER"
> >
> >
> >
> >
> >
> >
> >
> > Best Regards,
> >
> > Markus Paaso
> >
> >
> >
> >
> >
> > 2017-05-05 20:11 GMT+03:00 Louis Leblanc :
> >
> > Thanks Rahul,
> >
> >
> >
> > I tried both solutions (with pgxn and with the RPM package).
> >
> >
> >
> > I didn't make any change to  `/usr/local/madlib/Versions` after the
> > installation?
> >
> >
> >
> > Thanks.
> >
> >
> >
> > 2017-05-05 10:54 GMT-06:00 Rahul Iyer :
> >
> > Hi Louis,
> >
> > Just to clarify: did you use the pgxn to install or the RPM package
> > downloaded from Apache dist?
> >
> > And was there any change made to `/usr/local/madlib/Versions` after the
> > installation?
> >
> >
> >
> > I'm going to try to reproduce the issue on an Ubuntu VM, so would
> > appreciate your exact steps.
> >
> >
> >
> > - Rahul
> >
> >
> >
> > On Thu, May 4, 2017 at 8:41 AM, Louis Leblanc 
> > wrote:
> >
> > Thanks Rahul,
> >
> >
> >
> > - I used the process described here ==> https://cwiki.apache.org/
> > confluence/display/MADLIB/Installation+Guide#InstallationGuide-
> > PGXNInstallingfromPGXN(PostgreSQL)
> >
> > - I used the version apache-madlib-1.10.0-incubating-bin-Linux.rpm
> >  madlib/1.10.0-incubating/apache-madlib-1.10.0-incubating-bin-Linux.rpm>
> >
> > - Content of the folder `/usr/local/madlib/Versions` ==> 1.10.0
> >
> >
> >
> > Louis
> >
> >
> >
> >
> >
> > 2017-05-03 17:31 GMT-06:00 Rahul Iyer :
> >
> > Hi Louis,
> >
> > Please help us understand the problem further.
> >
> > - What was the process you used to install MADlib?
> > - Which version of MADlib did you install?
> > - Please print the contents of `/usr/local/madlib/Versions`
> >
> > - Rahul
> >
> >
> > On Wed, May 3, 2017 at 4:17 PM, Louis Leblanc 
> > wrote:
> > > Hello,
> > >
> > > I'm experiencing issues installing madlib on Ubuntu 16.04 with
> Postgresql
> > > 9.5.6.
> > >
> > > I followed the install

Re: [VOTE] MADlib v1.11-rc3

2017-05-05 Thread Frank McQuillan
I just want to comment on a couple items raised in the RC1 and RC2 votes
that pertain to RC3:

(1)
“I happened to open the file "CMakeLists.txt" in the root directory
and noticed it does not have the standard ASF header. I know there
were IP issues resolved globally for the project recently. I
noticed many of them are excluded in the pom.xml file. Regardless
of the IP issues, shouldn't these files contain the ASF header?”

Since this file existed before MADlib’s move to ASF, it does not need an
ASF header as per the guidance from ASF on this topic
https://issues.apache.org/jira/browse/LEGAL-293

(2)
“The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a
pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called
"apache-madlib-1.11-incubating-Darwin.pkg"?

Similarly, the DMG base folder name is madlib-1.11.Darwin.“

As per guidance from Roman our mentor, it is not necessary to rename all
packages and files.  Also, this may affect some functional tests that look
for certain file names.

(3)
“There are still three outstanding Jira issues in an "Unresolved" state
with a fix version of v1.11.  Are they going to be resolved soon? They can
be seen with the following url:

https://issues.apache.org/jira/browse/MADLIB/fixforversion/12339592/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
”

Regarding the JIRAs that are not closed, the actual work has been done so
there is nothing material pending.  But I did not close them because I
wanted Roman to do that, since he was the one overseeing them.

(4)
Convenience binaries are being voted on, as Rashmi’s email calls out.

(5)
I tried out the RC3 dmg and found that install, reinstall, upgrade work
fine with the soft link on my OS X box on PG 9.6

So...


+1




On Thu, May 4, 2017 at 6:10 PM, Rashmi Raghu  wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.11 RC-3, with the artifacts below (source and
> convenience binaries) up for a vote.
>
> Note that voting for the RC-2 release has been cancelled due to the need
> for minor corrections based on community feedback. Sorry for the
> inconvenience.
>
> RC-3 replaces RC-2 with the following minor changes:
> * Ensure product naming is consistently 'Apache MADlib (incubating)'
> * Git revision tag changed to rc/1.11-rc3
>
> This will be the 5th release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new module (PageRank for graph analytics with grouping support included)
> * improvements to existing modules (add grouping support to Single Source
> Shortest Path, reduce memory footprint of DT and RF, include NULL features
> in training DT, add support for array and svec output for Pivot module,
> utility to unnest 2-D arrays into rows of 1-D arrays)
> * platform updates (GPDB 5)
> * updates for Apache Top Level Project readiness and build process on
> Apache infrastructure
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
>
> *** Please download, review and vote by Tue May 09, 2017 @ 6pm PDT ***
>
> We're voting upon the source and convenience binaries below:
>
> Source Repository (tag):  rc/1.11-rc3
> https://github.com/apache/incubator-madlib/tree/rc/1.11-rc3
>
> Source Files and convenience Binaries:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> 11-incubating-rc3/
>
> Commit:
> https://github.com/apache/incubator-madlib/commit/
> 8e2778a3921aa99f009962756881ce4bea5eee16
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, PMC members please be sure to indicate
> "(binding)" with the vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
>
> Regards,
> Rashmi Raghu
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


Re: [DISCUSS] Graduation

2017-05-04 Thread Frank McQuillan
Thanks Roman.

I agree that this project is in the correct state to qualify as a TLP,
and would like to help move that forward.

In addition to the
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
that you mention, we also created a check list
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
which aims to describe where the project stands according the the Apache
project maturity model.

I would encourage members of the Apache MADlib community to take a look
at the check list and comment on any of the items there.

The project mgmt part of the wiki
https://cwiki.apache.org/confluence/display/MADLIB/Project+Management
also gives a pretty good snapshot of the project as it stands today.

Frank


On Thu, May 4, 2017 at 10:44 AM, Rahul Iyer  wrote:

> Hi Roman,
>
> Many thanks for your excellent mentorship!
>
> Your #2 and #3 proposals sound good to me and I look forward to the
> discussion on private@.
>
> - Rahul
>
>
> On Fri, Apr 28, 2017 at 10:47 AM, Roman Shaposhnik  wrote:
> > Hi!
> >
> > with the fifth (v1.11) release in the final stages of being cut,
> > I think now would be a good time to officially start our graduation
> > discussion. With my mentor hat on, I feel that the project is
> > mature and self-reliant enough to qualify as a TLP.
> >
> > Process-wise graduation consists of drafting a board resolution,
> > getting it approved by the IPMC and finally submitting it to the ASF
> > board's consideration. At the very minimum your resolution will contain:
> > 1. A name of the project (I assume that'll be MADlib)
> > 2. A list of proposed PMC members
> > 3. A proposed PMC chair
> > A good example of a resolution can be found here:
> > https://cwiki.apache.org/confluence/display/FINERACT/
> Graduation+Resolution
> >
> > In fact, Frank and I took the liberty to use that as the basis for our
> own:
> >  https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Please read it carefully and let us know what do you think.
> >
> > On #2 my suggestion would be to have an opt-in system. Basically
> > we will kick off the thread off on private@madlib asking current PPMC
> > members if they are willing to continue on the PMC.
> >
> > On #3 I typically recommend podlings I mentor to setup a rotating chair
> > policy. This is, in no way, an ASF requirement so feel free to ignore it,
> > but it worked well before. The chair will be expected up for rotation
> every
> > year. It will be more that ok for the same person to self-nominate once
> > the year is up -- but at the same time it'll be up to the same person to
> > actually kick off a thread asking if anybody else is interested in
> serving
> > as a chair for the next year. Of course, if there multiple candidates
> there
> > will have to be a vote.
> >
> > Speaking of self-nomination -- the same thread that we're going to kick
> > off as part of solving for #2 will ask for folks to self-nominate as an
> initial
> > chair to be listed on the resolution.
> >
> > Unless somebody objects strongly to my #2 and #3 proposals I'm going
> > to kick of this thread on private@.
> >
> > With that in mind, lets make the rest of the discussion on dev@ to be
> about
> > collecting the datapoints to present to IPCM as part of us asking them to
> > vote YES on our graduation. Lets collect all these data points in the
> same
> > wiki page:
> > https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Or if you feel that a discussion may be needed -- just reply to this
> thread.
> >
> > Thanks,
> > Roman.
>


Re: [VOTE] MADlib v1.11-rc2

2017-05-03 Thread Frank McQuillan
t; >   - I happened to open the file "CMakeLists.txt" in the root directory
> > and noticed it does not have the standard ASF header. I know there
> > were IP issues resolved globally for the project recently. I
> > noticed many of them are excluded in the pom.xml file. Regardless
> > of the IP issues, shouldn't these files contain the ASF header?
> >
> > ==
> > Source miscelaneous: HAWQ_Install.txt
> >
> >   Observation:
> >
> >   - The file references the product name as "MADlib" and not "Apache
> > MADlib (Incubating). Is this file still valid?
> >
> > ==
> > CONVENIENCE BINARIES
> > --
> >
> > --
> > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg
> > --
> >
> >   Observation:
> >
> >   - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a
> > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called
> > "apache-madlib-1.11-incubating-Darwin.pkg"?
> >
> > Similarly, the DMG base folder name is madlib-1.11.Darwin.
> >
> > Mac Installer Package
> >
> > o Introduction screen
> >
> >   Observation:
> >
> >   - The introduction screen identifies the product name as
> > "MADlib". Shouldn't there be a mention of the project name being
> > "Apache MADlib (Incubating)".
> >
> > o Read Me screen
> >
> >   Observation:
> >
> >   - Similar to initial screen, there is no mention to the Apache
> > project except for the link to the project's wiki.
> >
> > o Remaining screens look reasonable (with exception of no Apache
> >   references).
> >
> > o The default application window name is "Install MADlib"
> >
> > Observation:
> >
> >   - Similar to Introduction sreen, should the name be "Install Apache
> > MADlib (Incubating)"?
> >
> >   - Look for other opportunities to reference the product name as
> > "Apache MADlib (Incubating)".
> >
> > --
> > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm
> > --
> >
> >   Observation:
> >
> >   - It appears the SPEC file used (possibly generated) references the
> > product name as "madlib".  Again, shouldn't there be references to
> > the product name as "Apache MADlib" scattered about?
> > Unfortunately, I am not sure if this should change or not. It
> > might help for someone on the team to review other Apache projects
> > convenience binary RPMs to see if something should be
> > addressed. The podling's mentor might be able to provide
> > additional direction as well.
> >
> > This can be seen in the following "rpm -qi madlib" output:
> >
> > [root@e0f4d3349d2d MADlib]# rpm -qi madlib
> > Name: madlib
> > Version : 1.11
> > Release : 1
> > Architecture: x86_64
> > Install Date: Wed May  3 04:00:10 2017
> > Group   : Development/Libraries
> > Size: 83575356
> > License : ASL 2.0
> > Signature   : (none)
> > Source RPM  : madlib-1.11-1.src.rpm
> > Build Date  : Tue May  2 19:03:21 2017
> > Build Host  : gpdb1.eng.pivotal.io
> > Relocations : /usr/local
> > Vendor  : MADlib
> > Summary : Open-Source Library for Scalable in-Database
> > Analytics
> > Description :
> > MADlib is an open-source library for scalable in-database
> > analytics. It
> > provides data-parallel implementations of mathematical,
> > statistical and
> > machine learning methods for structured and unstructured data.
> >
> > The MADlib mission: to foster widespread development of scalable
> > analytic skills, by harnessing efforts from commercial practice,
> > academic research, and open-source development.
> >
> >   

Re: [VOTE] MADlib v1.11-rc2

2017-05-02 Thread Frank McQuillan
Thanks for updating to RC-2, Rashmi.

I just tried the dmg on OSX on PG9.6 on my local machine and the soft link
seems to be set correctly now, since it upgraded 1.11 over 1.10 OK.  When I
uninstalled MADlib and did a fresh install, that worked fine too for 1.11.
So...

+1

Frank

On Tue, May 2, 2017 at 5:01 PM, Rashmi Raghu  wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.11 RC-2, with the artifacts below (source and
> convenience binaries) up for a vote.
>
> Note that voting for the RC-1 release has been cancelled due to the need
> for minor corrections based on community feedback. Sorry for the
> inconvenience.
>
> RC-2 replaces RC-1 with the following minor changes:
> * Ensure source tarball unpacks into a folder
> * Ensure soft links are correct for OS X installations
>
> This will be the 5th release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new module (PageRank for graph analytics with grouping support included)
> * improvements to existing modules (add grouping support to Single Source
> Shortest Path, reduce memory footprint of DT and RF, include NULL features
> in training DT, add support for array and svec output for Pivot module,
> utility to unnest 2-D arrays into rows of 1-D arrays)
> * platform updates (GPDB 5)
> * updates for Apache Top Level Project readiness and build process on
> Apache infrastructure
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
>
> *** Please download, review and vote by Fri May 05, 2017 @ 6pm PDT ***
>
> We're voting upon the source and convenience binaries below:
>
> Source Repository (tag):  rc/1.11-rc2
> https://github.com/apache/incubator-madlib/tree/rc/1.11-rc2
>
> Source Files and convenience Binaries:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> 11-incubating-rc2/
>
> Commit:
> https://github.com/apache/incubator-madlib/commit/
> d54be2b8574c5bf0ace96b94ba81f3e5cbf70a35
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, PMC members please be sure to indicate
> "(binding)" with the vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
>
> Regards,
> Rashmi Raghu
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


Re: [VOTE] MADlib v1.11-rc1

2017-05-02 Thread Frank McQuillan
Hi Ed,

The binaries are up for a vote for 1.11.  Although we have hosted
convenience binaries on the Apache dist site in the past, we have not
explicitly included them in the RC's that we have posted for community vote
on a release.  Guidance from one of our mentors suggested that we start to
do this, so we have included binaries this time around.

Frank



On Tue, May 2, 2017 at 9:57 AM, Xiaocheng Tang 
wrote:

> +1
>
> Xiaocheng
> --
> *From:* Daisy Zhe Wang 
> *Sent:* Tuesday, May 2, 2017 6:48:43 AM
> *To:* dev@madlib.incubator.apache.org
> *Cc:* u...@madlib.incubator.apache.org
> *Subject:* Re: [VOTE] MADlib v1.11-rc1
>
>
> +1
>
> On Mon, 1 May 2017 18:18:32 -0700
> Joe Hellerstein  wrote:
>
> > +1
> >
> > Sent from a telephone.
> >
> > > On May 1, 2017, at 6:12 PM, Ivan Novick  wrote:
> > >
> > > +1
> > >
> > >> On Mon, May 1, 2017 at 5:56 PM, ChenLiang Wang
> > >>  wrote:
> > >>
> > >> +1
> > >>
> > >>> On 05/02/2017 08:23 AM, Woo Jae Jung wrote:
> > >>> +1
> > >>>
> > >>> On Mon, May 1, 2017 at 4:45 PM, Frank McQuillan
> > >>>  wrote:
> > >>>
> > >>>> +1
> > >>>>
> > >>>> On Mon, May 1, 2017 at 4:31 PM, Jarrod Vawdrey
> > >>>>  wrote:
> > >>>>
> > >>>>> +1
> > >>>>>
> > >>>>>
> > >>>>> Jarrod Vawdrey
> > >>>>> Sr. Data Scientist
> > >>>>> Data Science & Engineering | Pivotal Atlanta
> > >>>>> (678) 651-0795
> > >>>>> https://pivotal.io/
> > >>>>>
> > >>>>> On Mon, May 1, 2017 at 7:30 PM, Orhan Kislal
> > >>>>> 
> > >> wrote:
> > >>>>>
> > >>>>>> +1
> > >>>>>>
> > >>>>>> On Mon, May 1, 2017 at 4:25 PM, Rahul Iyer
> > >>>>>> 
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> +1
> > >>>>>>>
> > >>>>>>>> On May 1, 2017 3:55 PM, "Rashmi Raghu" 
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hello MADlib community,
> > >>>>>>>>
> > >>>>>>>> We have created a MADlib 1.11 RC-1, with the artifacts below
> > >>>>>>>> up for
> > >> a
> > >>>>>>> vote.
> > >>>>>>>>
> > >>>>>>>> This will be the 5th release for Apache MADlib (incubating).
> > >>>>>>>>
> > >>>>>>>> The main goals of this release are:
> > >>>>>>>> * new module (PageRank for graph analytics with grouping
> > >>>>>>>> support
> > >>>>>>> included)
> > >>>>>>>> * improvements to existing modules (add grouping support to
> > >>>>>>>> Single
> > >>>>>>> Source
> > >>>>>>>> Shortest Path, reduce memory footprint of DT and RF, include
> > >>>>>>>> NULL
> > >>>>>>> features
> > >>>>>>>> in training DT, add support for array and svec output for
> > >>>>>>>> Pivot
> > >>>>> module,
> > >>>>>>>> utility to unnest 2-D arrays into rows of 1-D arrays)
> > >>>>>>>> * platform updates (GPDB 5)
> > >>>>>>>> * updates for Apache Top Level Project readiness and build
> > >>>>>>>> process
> > >> on
> > >>>>>>>> Apache infrastructure
> > >>>>>>>> * bug fixes
> > >>>>>>>> * doc improvements
> > >>>>>>>>
> > >>>>>>>> For more information including release notes, please see:
> > >>>>>>>> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
> > >>>>>>>>
> > >>>>>>>> *** Please download, review and vote by Thu May 04, 2017 @
> > >>>>>>>> 6pm PDT
> > >>>>> ***
> > >>>>>>>>
> > >>>>>>>> We're voting upon the source (tag):  rc/1.11-rc1
> > >>>>>>>> https://github.com/apache/incubator-madlib/tree/rc/1.11-rc1
> > >>>>>>>>
> > >>>>>>>> Source Files:
> > >>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> > >>>>>>>> 11-incubating-rc1/
> > >>>>>>>>
> > >>>>>>>> Commit to be voted upon:
> > >>>>>>>> https://github.com/apache/incubator-madlib/commit/
> > >>>>>>>> 0ff829a7060d08f284e8468ebf35c31b6e231d58
> > >>>>>>>>
> > >>>>>>>> KEYS file containing PGP Keys we use to sign the release:
> > >>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> > >>>>>>>>
> > >>>>>>>> To help in tallying the vote, PMC members please be sure to
> > >>>>>>>> indicate "(binding)" with the vote.
> > >>>>>>>>
> > >>>>>>>> [ ] +1  approve
> > >>>>>>>> [ ] +0  no opinion
> > >>>>>>>> [ ] -1  disapprove (and reason why)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Rashmi Raghu
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Rashmi Raghu, Ph.D.
> > >>>>>>>> Pivotal Data Science
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Ivan Novick
> > > Product Manager Pivotal Greenplum
> > > https://www.youtube.com/GreenplumDatabase
>
>


Re: [VOTE] MADlib v1.11-rc1

2017-05-01 Thread Frank McQuillan
+1

On Mon, May 1, 2017 at 4:31 PM, Jarrod Vawdrey  wrote:

> +1
>
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal Atlanta
> (678) 651-0795
> https://pivotal.io/
>
> On Mon, May 1, 2017 at 7:30 PM, Orhan Kislal  wrote:
>
> > +1
> >
> > On Mon, May 1, 2017 at 4:25 PM, Rahul Iyer  wrote:
> >
> >> +1
> >>
> >> On May 1, 2017 3:55 PM, "Rashmi Raghu"  wrote:
> >>
> >> > Hello MADlib community,
> >> >
> >> > We have created a MADlib 1.11 RC-1, with the artifacts below up for a
> >> vote.
> >> >
> >> > This will be the 5th release for Apache MADlib (incubating).
> >> >
> >> > The main goals of this release are:
> >> > * new module (PageRank for graph analytics with grouping support
> >> included)
> >> > * improvements to existing modules (add grouping support to Single
> >> Source
> >> > Shortest Path, reduce memory footprint of DT and RF, include NULL
> >> features
> >> > in training DT, add support for array and svec output for Pivot
> module,
> >> > utility to unnest 2-D arrays into rows of 1-D arrays)
> >> > * platform updates (GPDB 5)
> >> > * updates for Apache Top Level Project readiness and build process on
> >> > Apache infrastructure
> >> > * bug fixes
> >> > * doc improvements
> >> >
> >> > For more information including release notes, please see:
> >> > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
> >> >
> >> > *** Please download, review and vote by Thu May 04, 2017 @ 6pm PDT ***
> >> >
> >> > We're voting upon the source (tag):  rc/1.11-rc1
> >> > https://github.com/apache/incubator-madlib/tree/rc/1.11-rc1
> >> >
> >> > Source Files:
> >> > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> >> > 11-incubating-rc1/
> >> >
> >> > Commit to be voted upon:
> >> > https://github.com/apache/incubator-madlib/commit/
> >> > 0ff829a7060d08f284e8468ebf35c31b6e231d58
> >> >
> >> > KEYS file containing PGP Keys we use to sign the release:
> >> > https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> >> >
> >> > To help in tallying the vote, PMC members please be sure to indicate
> >> > "(binding)" with the vote.
> >> >
> >> > [ ] +1  approve
> >> > [ ] +0  no opinion
> >> > [ ] -1  disapprove (and reason why)
> >> >
> >> >
> >> > Regards,
> >> > Rashmi Raghu
> >> >
> >> > --
> >> > Rashmi Raghu, Ph.D.
> >> > Pivotal Data Science
> >> >
> >>
> >
> >
>


Fwd: Github's disappearing mirrors

2017-04-28 Thread Frank McQuillan
fyi

-- Forwarded message --
From: Chris Lambertus 
Date: Fri, Apr 28, 2017 at 12:22 PM
Subject: Github's disappearing mirrors
To: committers 


Hello committers,

We have received quite a few reports of github mirrors gone missing. We’ve
tracked this down to an errant process at Github which appears to be
deleting
not only ours but also other orgs’ mirrors. We contacted Github but have
yet to
receive a reply. Another organization also contacted github and received the
following reply:

"Hi there, Sorry for the trouble! We've now had a couple of reports of this
problem, and we've opened an issue internally to investigate.  I don't have
an
ETA on a fix, but we'll be in touch if we need more information from you or
if
we have any information to share.  Regards, Laura GitHub Support”


We have no further information at this time. We have been restoring the
mirrors
wherever possible, but until the root cause is resolved on Github’s side, we
expect mirrors to continue to be erroneously removed.

Access to the repos via the usual https://git-wip-us.apache.org/ channel
remains functional.

-Chris
ASF Infra


signature.asc
Description: PGP signature


Re: 1.11 release planning

2017-04-17 Thread Frank McQuillan
Thank you Rashmi.   Other folks in the community who have done it before
can help you out with the details.

Frank

On Mon, Apr 17, 2017 at 2:28 PM, Rashmi Raghu  wrote:

> I volunteer to be release manager.
>
> Thanks,
> Rashmi
>
> On Mon, Apr 17, 2017 at 2:26 PM, Frank McQuillan 
> wrote:
>
> > We are getting closer to having a RC for the 1.11 release - perhaps
> within
> > a week or so.
> >
> > After this release, we will be applying for graduation to TLP status in
> the
> > ASF, so hopefully this will be the last incubating release.
> >
> > The JIRAs for 1.11 are:
> >
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%
> > 20fixVersion%20%3D%20v1.11%20ORDER%20BY%20due%20ASC%2C%
> > 20priority%20DESC%2C%20created%20ASC
> >
> > Any volunteers out there to be release manager?
> >
> > Thanks,
> > Frank
> >
>
>
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


1.11 release planning

2017-04-17 Thread Frank McQuillan
We are getting closer to having a RC for the 1.11 release - perhaps within
a week or so.

After this release, we will be applying for graduation to TLP status in the
ASF, so hopefully this will be the last incubating release.

The JIRAs for 1.11 are:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.11%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

Any volunteers out there to be release manager?

Thanks,
Frank


Re: Graph SSSP Scale Tests

2017-04-05 Thread Frank McQuillan
Not yet for HAWQ.  For PostgreSQL, the larger data sets would be too big...

On Wed, Apr 5, 2017 at 2:11 PM, Greg Chase  wrote:

> Very nice!
>
> Do you have similar benchmarks for HAWQ and PostgreSQL?
>
> On Wed, Apr 5, 2017 at 1:26 PM, Ivan Novick  wrote:
>
> > looks good!
> >
> > On Thu, Apr 6, 2017 at 2:49 AM, Orhan Kislal  wrote:
> >
> > > Hello MADlib community,
> > >
> > >
> > >
> > > We have been doing some additional scale testing on SSSP introduced in
> > the
> > > 1.10 release
> > >
> > > http://madlib.incubator.apache.org/docs/latest/group__grp__sssp.html
> > >
> > >
> > >
> > > A sample of results, going up to 100M vertices and 5B edges can be
> found
> > in
> > > the following links:
> > >
> > >
> > > https://drive.google.com/file/d/0B62dTQMossK9eml5LV9EZ09LcmM/
> > > view?usp=sharing
> > >
> > > https://drive.google.com/file/d/0B62dTQMossK9dU1rSEs1TTBZN1U/
> > > view?usp=sharing
> > >
> > >
> > > So scaling looks pretty good.
> > >
> > >
> > >
> > > Please let me know if you have any comments.
> > >
> > >
> > > Orhan Kislal
> > >
> > > ­­
> > >
> >
> >
> >
> > --
> > Ivan Novick
> > Product Manager Pivotal Greenplum
> > https://www.youtube.com/GreenplumDatabase
> >
>


Re: DRAFT ASF report for MADlib for Q1 2017

2017-03-31 Thread Frank McQuillan
RM:
Ed, so far we don't have a release manager.  Any volunteers out there?

PR #75
 https://github.com/apache/incubator-madlib/pull/75
introduces mini-batching for SVM but also has potential application to
other algorithms.  It is not yet merged because it is part of an epic
https://issues.apache.org/jira/browse/MADLIB-1047
and those stories are still in flight or awaiting contributors

New usage or contributions
Certainly some good features planned for this release.  Lots of time still
for community members to contribute.  For example:
rashmi.ra...@gmail.com expressed an interest in working on
https://issues.apache.org/jira/browse/MADLIB-1086 and possible stratified
sampling

Frank








On Fri, Mar 31, 2017 at 11:11 AM, Ed Espino  wrote:

> My naivete on the proper content of the Apache quarterly reports are
> obvious to me and most likely to you ... but here are a couple of
> observations/questions on the draft.
>
>- As the next release is scheduled for April 2017, has someone
>volunteered to be the release manager?
>- I noticed there is one PR #75
><https://github.com/apache/incubator-madlib/pull/75> outstanding since
>Nov 2016. I think it might be worthy of noting that the dev community is
>reviewing PRs actively. This helps promote future contributions.
>- Have there been any significant new usage stories or contributions
>from the community to highlight?
>
> -=e
>
> On Thu, Mar 30, 2017 at 11:32 AM, Frank McQuillan 
> wrote:
>
> > Here is the draft ASF report for Apr 2017, covering Q1 2017 activity.
> >
> > It is posted at http://wiki.apache.org/incubator/April2017
> >
> > Please let me know if you have any comments or suggestions and I will
> > update the report.
> >
> > ---
> >
> > MADlib
> >
> > Big Data Machine Learning in SQL for Data Scientists.
> >
> > MADlib has been incubating since 2015-09-15.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. Continue to produce regular Apache (incubating) releases.
> >   2. Continue to execute and manage the project according to governance
> > model of the "Apache Way”.
> >   3. Continue to build community.
> >
> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware
> > of?
> >
> >  1. The next release v1.11 will be the 5th as an incubating project.  We
> > believe this release will meet all requirements for a clean ASF release,
> > based on listening to guidance from the IPMC over the previous releases.
> > After that, the community would ideally like to move towards top level
> > status.
> >  2.  The licensing issues have been resolved.  Should anyone want to
> > review, we have summarized the issue and resolution with relevant links
> on
> > the MADlib wiki at
> > https://cwiki.apache.org/confluence/display/MADLIB/ASF+
> Licensing+Guidance
> >
> > How has the community developed since the last report?
> >
> >   1. Some related events in Q1 2017:
> >   * Feb 4, 2017 - Presentation at FOSDEM’17 Graph devroom.  Topic:
> >  Graph Analytics on Massively Parallel Processing Databases (Frank
> > McQuillan)
> > * Feb 2, 2017 - Greenplum meetup in SF.  Topic:  Machine Learning and
> Cyber
> > Security with Greenplum and Apache MADlib (Anirudh Kondaveeti, Frank
> > McQuillan)
> > * Mar 23, 2017 - MADlib community call.  Topic:  New Features in Apache
> > MADlib 1.10 (Frank McQuillan)
> >   2. See material technical conversations on user/dev mailing lists and
> in
> > the appropriate JIRAs and pull requests.
> >
> > How has the project developed since the last report?
> >
> >   1. Build infra set up on Apache infra
> > https://builds.apache.org/job/madlib-master-build/
> >   2. Docker image with necessary dependencies required to compile and
> test
> > MADlib on PostgreSQL 9.6
> > https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+
> > Developers#QuickStartGuideforDevelopers-Dock
> >   3. Active work in progress for 5th ASF release MADlib v1.11 scheduled
> for
> > Apr 2017.  Features include: PageRank, connected components, stratified
> > sampling, improvements to decision tree & random forest, array & sparse
> > vector output for pivot
> >   4. Mailing list activity in Q1 to date:  274 postings to dev, 111
> > postings to user.
> >
> > How would you assess the podling's maturity?
> > Please feel free to add your own commentary.
> >
> >   [ ] Initial setup
> >   [ ] Working towards first release
> >   [ ] Community building
> >   [X] Nearing graduation
> >   [ ] Other:
> >
> > Date of last release:
> >
> >   MADlib v1.10 on 3/10/17.
> >
> > When were the last committers or PMC members elected:
> >
> >   Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.
> >
>
>
>
> --
> *Ed Espino*
>


DRAFT ASF report for MADlib for Q1 2017

2017-03-30 Thread Frank McQuillan
Here is the draft ASF report for Apr 2017, covering Q1 2017 activity.

It is posted at http://wiki.apache.org/incubator/April2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Continue to produce regular Apache (incubating) releases.
  2. Continue to execute and manage the project according to governance
model of the "Apache Way”.
  3. Continue to build community.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 1. The next release v1.11 will be the 5th as an incubating project.  We
believe this release will meet all requirements for a clean ASF release,
based on listening to guidance from the IPMC over the previous releases.
After that, the community would ideally like to move towards top level
status.
 2.  The licensing issues have been resolved.  Should anyone want to
review, we have summarized the issue and resolution with relevant links on
the MADlib wiki at
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance

How has the community developed since the last report?

  1. Some related events in Q1 2017:
  * Feb 4, 2017 - Presentation at FOSDEM’17 Graph devroom.  Topic:
 Graph Analytics on Massively Parallel Processing Databases (Frank
McQuillan)
* Feb 2, 2017 - Greenplum meetup in SF.  Topic:  Machine Learning and Cyber
Security with Greenplum and Apache MADlib (Anirudh Kondaveeti, Frank
McQuillan)
* Mar 23, 2017 - MADlib community call.  Topic:  New Features in Apache
MADlib 1.10 (Frank McQuillan)
  2. See material technical conversations on user/dev mailing lists and in
the appropriate JIRAs and pull requests.

How has the project developed since the last report?

  1. Build infra set up on Apache infra
https://builds.apache.org/job/madlib-master-build/
  2. Docker image with necessary dependencies required to compile and test
MADlib on PostgreSQL 9.6
https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers#QuickStartGuideforDevelopers-Dock
  3. Active work in progress for 5th ASF release MADlib v1.11 scheduled for
Apr 2017.  Features include: PageRank, connected components, stratified
sampling, improvements to decision tree & random forest, array & sparse
vector output for pivot
  4. Mailing list activity in Q1 to date:  274 postings to dev, 111
postings to user.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

  [ ] Initial setup
  [ ] Working towards first release
  [ ] Community building
  [X] Nearing graduation
  [ ] Other:

Date of last release:

  MADlib v1.10 on 3/10/17.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Question about data distribution for graph algos

2017-03-28 Thread Frank McQuillan
I have a question about distribution of data to the segments for the
various graph processing algos we are building.

Do we have guidance for users on how to distribute data?

Does the strategy vary by algorithm?

What impact will data distribution have on performance?

Looking at Section 4.1 of the Pregel paper
https://kowshik.github.io/JPregel/pregel_paper.pdf
it has a default partitioning scheme of hash(ID)
mod N, where N is the number of partitions.  But then it says

“Some applications work well with the default assignment, but some
benefit from defining custom assignment functions to better
exploit locality inherent in the graph. For example, a typical
heuristic employed for the Web graph is to colocate vertices
representing pages of the same site.”

Frank


Re: REMINDER - MADlib Community call - Thursday, March 23 @ 11:30 am

2017-03-23 Thread Frank McQuillan
Thanks for attending the call today.

Quick review of the topics covered:

1.10 release notes
https://github.com/apache/incubator-madlib/blob/master/RELEASE_NOTES

Demo of new features, Jupyter notebooks posted at
https://github.com/apache/incubator-madlib-site/tree/asf-site/community-artifacts

Looking ahead to 1.11:

Quicker release!
* Target end April/beg May 2017
* Final prep for application to graduate to Top Level Status
Graph
* Page rank
* Connected components
* All pairs shortest path (?)
Decision tree/random forest improvements


On Thu, Mar 23, 2017 at 10:00 AM, Bob Glithero  wrote:

> Dear MADlib, HAWQ, and Greenplum Communities,
>
>
>
> Reminder that we are organizing a MADlib Virtual Community Meeting
> Thursday, March 23 at 11:30AM Pacific (18:30 GMT).   We hope to see you
> there!
>
>
>
> Join the meeting here: https://pivotal.zoom.us/j/820746991
>
>
>
> Add to your calendar: https://www.google.com/calendar/event?eid=
> b2xldjc1ODQ2dnJvZmhvbnMyMHNwOXM3MzAgcmdsaXRoZXJvQHBpdm90YWwuaW8&ctz=
> America/Los_Angeles
>
>
>
> For this meeting, we'll be describing and demonstrating new capabilities
> of 1.10 including:
>
>
>
> * New modules: single source shortest path, the first algorithm in graph
> processing in MADlib.  Also new encoding categorical variables, K-nearest
> neighbors.
>
>
>
> * Improvements to existing modules:  add grouping support to elastic net
> and PCA, add cross validation to elastic net, array input for K-means,
> verbose output option for decision trees and random forest, limit itemset
> size in association rules, various madpack installer improvements
>
>
>
> Finally, we’ll spend a little time discussing the upcoming 1.11 release.
>
>
>
> Release notes can be found here:
>
> *https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
> *
>
>
>
> Software:
>
> https://dist.apache.org/repos/dist/release/incubator/madlib/
> 1.10.0-incubating//
>
>
>
> User documentation:
>
> *http://madlib.incubator.apache.org/docs/latest/
> *
>
>
>
> Bob Glithero | Product Marketing
> Pivotal, Inc.
> rglith...@pivotal.io | m: 415.341.5592
>
>


Re: Apache Jenkins MADlib projects

2017-03-14 Thread Frank McQuillan
Here are the relevant JIRAs:

Docker image
https://issues.apache.org/jira/browse/MADLIB-920

PR integration
https://issues.apache.org/jira/browse/MADLIB-1080
* this is the new JIRA - Ed please provide the HAWQ links in this thread or
add them to this JIRA

Thanks,
Frank


On Tue, Mar 14, 2017 at 9:38 AM, Rahul Iyer  wrote:

> Thanks, Ed.
>
> The master and PR integration would be quite useful for MADlib and are on
> the cards. We're in the process of wrapping our docker work; once that goes
> in, we can finalize these other projects.
> It would be easier for us to start with the HAWQ projects as references -
> could you please post their links?
>
> Best,
> iR
>
> On Tue, Mar 14, 2017 at 8:15 AM, Ed Espino  wrote:
>
> > I see Apache Jenkins build service testing in madlib-test-build
> >  is being worked on.
> > This
> > is pretty cool for the dev community. Is there a set of projects and
> GitHub
> > *master* branch and *Pull Request* (PR) integration points being worked
> on?
> >
> > For what it is worth, here are some integration points we have for the
> HAWQ
> > project that may be of use to MADlib:
> >
> >- For each Pull Request (PR), perform the following checks (these go
> >along with the default conflict check performed automatically by
> > github):
> >   - Perform build (compilation) and Apache Release Audit Tool (RAT)
> >   check
> >- For each master branch submission:
> >   - Perform build (compilation)
> >   - Perform Apache Release Audit Tool (RAT) check
> >   - Add "Embeddable Build Status Icon" to the project's README.md:
> >   https://builds.apache.org/job/madlib-test-build/badge/
> >
> > Cheers,
> > -=e
> >
> > --
> > *Ed Espino*
> >
>


Announcing MADlib v1.10 GA

2017-03-10 Thread Frank McQuillan
MADlib v1.10 is now generally available.

The vote was PASSED by Incubator PMC members:
http://mail-archives.apache.org/mod_mbox/incubator-general/201703.mbox/%3CCAKBQfzTSxD1e53iTnNbci89HYXoyah9kg-8zLts83_8kMRtWGw%40mail.gmail.com%3E

Special thanks to mentor Roman Shaposhnik for his help in resolving some
thorny legal issues leading up to this release.

The source and binaries are posted at:
https://dist.apache.org/repos/dist/release/incubator/madlib/1.10.0-incubating/

Release notes:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10

User documentation:
http://madlib.incubator.apache.org/docs/latest/

We look forward to community participation for the next release v1.11 and
moving towards TLP graduation!

Regards,
Frank McQuillan


Re: [VOTE] MADlib v1.10-rc2

2017-03-09 Thread Frank McQuillan
I see.  In that case I will remove those links from the download page when
I update the web site tomorrow announcing the 1.10 release (assuming that
it goes thru IPMC voting OK).

Frank

On Thu, Mar 9, 2017 at 2:30 PM, Roman Shaposhnik 
wrote:

> CCing dev@madlib
>
> On Thu, Mar 9, 2017 at 9:26 AM, Frank McQuillan 
> wrote:
> > @john
> > Pivotal Network
> > https://network.pivotal.io/
> > is a commercial download site maintained by Pivotal.  MADlib binaries are
> > also hosted there after Apache releases are completed
> > e.g.,
> > https://network.pivotal.io/products/pivotal-gpdb#/
> releases/4540/file_groups/491
>
> I reviewed the link that John mentioned and I must say I agree with him.
> That link just doesn't belong to a Download page of an ASF project.
>
> It would be fine on a "powered by" kind of a page or on the wiki, but not
> on a main download page.
>
> Thanks,
> Roman.
>


Re: [VOTE] MADlib v1.10-rc2

2017-03-07 Thread Frank McQuillan
[RESULT][VOTE] MADlib v1.10-rc2

Hello,

Thank you to all community members who voted.

On behalf of release manager Satoshi, below is the tally of the votes:

+1 (binding):

none


+1 (non binding):

Joseph Hellerstein
Daisy She Wang
Xixuan (Aaron) Feng
Rahul Iyer
Xiaocheng Tang
Orhan Kislal
Nandish Jayaram
Marshall Presser
Milenko Petrovic


0, -1 or other votes:

none


I will post an email vote request to gene...@incubator.apache.org and
indicate to the ASF incubator principles that the MADlib community has
endorsed the release of the v1.10-rc2 artifacts.

Regards,
Frank


On Mon, Mar 6, 2017 at 5:46 PM, Frank McQuillan 
wrote:

> JIRAs created to address Ed's observations.  Setting for 1.11 since not
> blockers for 1.10.
>
> https://issues.apache.org/jira/browse/MADLIB-1076
> https://issues.apache.org/jira/browse/MADLIB-1077
> https://issues.apache.org/jira/browse/MADLIB-1078
>
> Frank
>
> On Fri, Mar 3, 2017 at 8:26 PM, Ed Espino  wrote:
>
>> I had some time and had been wanting to perform a MADlib build.  Here are
>> my notes from my quick review of MADlib v1.10-rc2. Sorry if the information
>> is a bit scattered.
>>
>> Regards,
>> -=ed espino
>>
>> ==
>> Checksums are good
>> ==
>> PGP signature is good
>> ==
>> Extracted tarball base directory (apache-madlib-src-1.10-incubating)
>> good
>> ==
>> LICENSE
>>
>>   Shouldn't the components with files in licenses/third_party be
>>   referenced in LICENSE file?
>>
>> Boost_Software_License_v1.txt
>> Eigen_v3.1.2.txt
>> PyXB_v1.2.3.txt
>> PyYAML_v3.10.txt
>> Python_License_v2.7.1.txt
>> UseLATEX_v1.9.4.txt
>> _M_widen_init.txt
>> argparse_v1.2.1.txt
>>
>> From README.md, I only saw an incomplete reference to the third party
>> components.
>>
>>   Third Party Components
>>   MADlib incorporates material from the following third-party components
>>
>>   argparse 1.2.1 "provides an easy, declarative interface for creating
>> command line tools"
>>   Boost 1.47.0 (or newer) "provides peer-reviewed portable C++ source
>> libraries"
>>   Eigen 3.2.2 "is a C++ template library for linear algebra"
>>   PyYAML 3.10 "is a YAML parser and emitter for Python"
>>   PyXB 1.2.4 "is a Python library for XML Schema Bindings"
>>
>> ==
>> DISCLAIMER good
>> ==
>> NOTICE good
>> ==
>> BUILD, INSTALL and INSTALL-CHECK
>>
>> I was able to build the package and successfully ran MADlib
>> install-check against PostgreSQL 9.6.2.
>>
>> Issue: There is no obvious reference to the PostgreSQL libxml
>>dependency in dev documentation. The madpack install-check
>>has failures (see below) if "--with-libxml" configure
>>option is not specified for PostgreSQL.
>>
>>install-check errors encountered due to PostgreSQL
>>configuration without "--with-libxml" option:
>>
>>  psql:/tmp/madlib.0UIPlZ/pmml/test/table_to_pmml.sql_in.tmp:73:
>> ERROR:  unsupported XML feature
>>  DETAIL:  This functionality requires the server to be built with
>> libxml support.
>>  HINT:  You need to rebuild PostgreSQL using --with-libxml.
>>  CONTEXT:  while creating return value
>>  PL/Python function "pmml"
>>
>> Issue: AUTO DOWNLOADED PACKAGES
>>
>>   I was performing the build from a simple perspective. Download
>>   source, configure, make and glance at docs (in this order).
>>
>>   As we have dealt with auto-downloaded files in the HAWQ project, I
>>   was a surprised that the following packages were automatically
>>   downloaded for me. On the HAWQ project we were instructed to require
>>   these as pre-requisites and or make them optional included via
>>   command line options (configure).  I'm guessing other packages would
>>   have been automatically downloaded if they were not found on system
>>   (eg: boost).
>>
>>   Automatically downloaded packages:
>>
>>   https://github.com/madlib/eigen

[RESULT][VOTE] MADlib v1.10-rc2

2017-03-07 Thread Frank McQuillan
Hello,

Thank you to all community members who voted.

On behalf of release manager Satoshi, below is the tally of the votes:

+1 (binding):

none


+1 (non binding):

Joseph Hellerstein
Daisy She Wang
Xixuan (Aaron) Feng
Rahul Iyer
Xiaocheng Tang
Orhan Kislal
Nandish Jayaram
Marshall Presser
Milenko Petrovic


0, -1 or other votes:

none


I will post an email vote request to gene...@incubator.apache.org and
indicate to the ASF incubator principles that the MADlib community has
endorsed the release of the v1.10-rc2 artifacts.

Regards,
Frank


Re: [VOTE] MADlib v1.10-rc2

2017-03-06 Thread Frank McQuillan
sion
> madpack.py : INFO : MADlib tools version= 1.10.0
> (/usr/local/madlib/Versions/1.10.0/bin/../madpack/madpack.py)
>
> ==
>
> ------
> Attached to this email: For reference: here is the entire build log
> (including PostgreSQL 9.6.2) and test run attempts. Several of the
> issues above can be seen in the log.
> --
>
>
> On Fri, Mar 3, 2017 at 4:20 PM, Orhan Kislal  wrote:
>
>> +1
>>
>> On Fri, Mar 3, 2017 at 4:14 PM, Rahul Iyer  wrote:
>>
>> > +1
>> >
>> > On Fri, Mar 3, 2017 at 11:17 AM, Frank McQuillan > >
>> > wrote:
>> >
>> > > Hello MADlib community,
>> > >
>> > > I am sending this email on behalf of the release manager Satoshi
>> > Nagayasu <
>> > > sn...@uptime.jp> .
>> > >
>> > > We have created a MADlib 1.10 RC-2, with the artifacts below up for a
>> > vote.
>> > >
>> > > From project mentor Roman Shaposhnik we heard the ultimate resolution
>> on
>> > > the IP issue:
>> > >* we don't do anything with existing (BSD) files even if we edit
>> them
>> > >* every new file we create gets an ASF license header
>> > >* more details:
>> > >
>> > > https://issues.apache.org/jira/browse/LEGAL-293?
>> > focusedCommentId=15881595&
>> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
>> > > comment-tabpanel#comment-15881595
>> > >
>> > > RC-2 replaces RC-1 with the following changes:
>> > >
>> > > * Multiple: Update license headers per Apache guidance
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > * Build: Fix module sort order for PGXN installation
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > fa80240f72a6551c2ee567d471afa499fd1d1efe
>> > >
>> > > * Update the copyright year.
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 0b8415e7eec5c9ebb83fbf22923c69a99b0056ef
>> > >
>> > > * Build: Add error for missing server includedir
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > b3495c50bf491139ac245a21d97963e81892c610
>> > >
>> > > * Encode categorical: Add distributed_by in Postgresql w/ no-op
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 7055dceb3fbde35bae602ac80d4b70486f015748
>> > >
>> > > * Renamed the top level source directory as suggested:
>> > > apache-madlib-src-1.10-incubating
>> > >
>> > > This will be the 4th release for Apache MADlib (incubating).
>> > >
>> > > The main goals of this release are:
>> > > * new modules (single source shortest path for graph analytics, encode
>> > > categorical variables, K-nearest neighbors)
>> > > * improvements to existing modules (add grouping support to elastic
>> > > net and PCA, add cross validation to elastic net, array input for
>> > > K-means, verbose output option for DT and RF, limit itemset size in
>> > > association rules, various madpack installer improvements)
>> > > * platform updates (PostgreSQL 9.6)
>> > > * bug fixes
>> > > * doc improvements
>> > >
>> > > For more information including release notes, please see:
>> > > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
>> > >
>> > > *** Please download, review and vote by Mon Mar 6, 2017 @ 6pm Pacific
>> > Time
>> > > USA ***
>> > >
>> > > We're voting upon the source (tag):  rc/1.10.0-rc2
>> > > https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc2
>> > >
>> > > Source Files:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
>> > > 10.0-incubating-rc2/
>> > >
>> > > Commit to be voted upon:
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > KEYS file containing PGP Keys we use to sign the release:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>> > >
>> > > To help in tallying the vote, can PMC members please be sure to
>> > > indicate "(binding)" with their vote.
>> > >
>> > > [ ] +1  approve
>> > > [ ] +0  no opinion
>> > > [ ] -1  disapprove (and reason why)
>> > >
>> > > Regards,
>> > > Frank McQuillan
>> > >
>> >
>>
>
>


Re: [VOTE] MADlib v1.10-rc2

2017-03-06 Thread Frank McQuillan
>
> ------
> Attached to this email: For reference: here is the entire build log
> (including PostgreSQL 9.6.2) and test run attempts. Several of the
> issues above can be seen in the log.
> --
>
>
> On Fri, Mar 3, 2017 at 4:20 PM, Orhan Kislal  wrote:
>
>> +1
>>
>> On Fri, Mar 3, 2017 at 4:14 PM, Rahul Iyer  wrote:
>>
>> > +1
>> >
>> > On Fri, Mar 3, 2017 at 11:17 AM, Frank McQuillan > >
>> > wrote:
>> >
>> > > Hello MADlib community,
>> > >
>> > > I am sending this email on behalf of the release manager Satoshi
>> > Nagayasu <
>> > > sn...@uptime.jp> .
>> > >
>> > > We have created a MADlib 1.10 RC-2, with the artifacts below up for a
>> > vote.
>> > >
>> > > From project mentor Roman Shaposhnik we heard the ultimate resolution
>> on
>> > > the IP issue:
>> > >* we don't do anything with existing (BSD) files even if we edit
>> them
>> > >* every new file we create gets an ASF license header
>> > >* more details:
>> > >
>> > > https://issues.apache.org/jira/browse/LEGAL-293?
>> > focusedCommentId=15881595&
>> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
>> > > comment-tabpanel#comment-15881595
>> > >
>> > > RC-2 replaces RC-1 with the following changes:
>> > >
>> > > * Multiple: Update license headers per Apache guidance
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > * Build: Fix module sort order for PGXN installation
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > fa80240f72a6551c2ee567d471afa499fd1d1efe
>> > >
>> > > * Update the copyright year.
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 0b8415e7eec5c9ebb83fbf22923c69a99b0056ef
>> > >
>> > > * Build: Add error for missing server includedir
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > b3495c50bf491139ac245a21d97963e81892c610
>> > >
>> > > * Encode categorical: Add distributed_by in Postgresql w/ no-op
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 7055dceb3fbde35bae602ac80d4b70486f015748
>> > >
>> > > * Renamed the top level source directory as suggested:
>> > > apache-madlib-src-1.10-incubating
>> > >
>> > > This will be the 4th release for Apache MADlib (incubating).
>> > >
>> > > The main goals of this release are:
>> > > * new modules (single source shortest path for graph analytics, encode
>> > > categorical variables, K-nearest neighbors)
>> > > * improvements to existing modules (add grouping support to elastic
>> > > net and PCA, add cross validation to elastic net, array input for
>> > > K-means, verbose output option for DT and RF, limit itemset size in
>> > > association rules, various madpack installer improvements)
>> > > * platform updates (PostgreSQL 9.6)
>> > > * bug fixes
>> > > * doc improvements
>> > >
>> > > For more information including release notes, please see:
>> > > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
>> > >
>> > > *** Please download, review and vote by Mon Mar 6, 2017 @ 6pm Pacific
>> > Time
>> > > USA ***
>> > >
>> > > We're voting upon the source (tag):  rc/1.10.0-rc2
>> > > https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc2
>> > >
>> > > Source Files:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
>> > > 10.0-incubating-rc2/
>> > >
>> > > Commit to be voted upon:
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > KEYS file containing PGP Keys we use to sign the release:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>> > >
>> > > To help in tallying the vote, can PMC members please be sure to
>> > > indicate "(binding)" with their vote.
>> > >
>> > > [ ] +1  approve
>> > > [ ] +0  no opinion
>> > > [ ] -1  disapprove (and reason why)
>> > >
>> > > Regards,
>> > > Frank McQuillan
>> > >
>> >
>>
>
>


Re: [VOTE] MADlib v1.10-rc1

2017-03-03 Thread Frank McQuillan
To finish this thread, I captured all of these licensing issues on the
MADlib wiki at
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance
should anyone need to refer to it.


On Tue, Feb 28, 2017 at 11:43 AM, Frank McQuillan 
wrote:

> Thanks Rahul.  I see your commit has addressed the remaining issues:
> https://git1-us-west.apache.org/repos/asf?p=incubator-
> madlib.git;a=commit;h=a3863b6c
>
> We are declaring create_indicators.* as new files so they will have
> Apache header.
>
> For the record, I attached an Excel spreadsheet with some more notes so
> that we remember how we went from the two lists Rahul posted above to the
> above commit.
>
> Frank
>
> On Mon, Feb 27, 2017 at 5:44 PM, Rahul Iyer  wrote:
>
>> I have attached two files:
>>
>> new_files_after_apache.txt: New files added since September 15, 2015
>> (grant date) till date
>> files_w_apache_header.txt: Files that contain the Apache header right
>> now.
>>
>> Comparing the two lists, there are open questions regarding below files.
>>
>> Extra headers:
>> - sort-module.py has Apache header but was created before grant (recently
>> edited and header added). *I'll fix this*.
>> - create_indicators.* have headers but were renamed from
>> data_preparation.*. *What is the legal guidance with this*?
>>
>> No header:
>> - class_diagram.mp looks like a text file with no header, even though it
>> was added just after the grant. I'm not aware of the purpose of this file.
>>
>>
>>
>> On Mon, Feb 27, 2017 at 4:42 PM, Frank McQuillan 
>> wrote:
>>
>>> OK, so we need to go back and do the comparison from the original code
>>> grant in the fall of 2015 to the  current 1.10 release candidate.
>>>
>>> On Mon, Feb 27, 2017 at 4:19 PM, Roman Shaposhnik 
>>> wrote:
>>>
>>> > Frank, I'm not sure I understand the question. The criteria needs to
>>> hold
>>> > for anything that came in via the initial code ingest compared to how
>>> the
>>> > master of your project looks now.
>>> >
>>> > Thanks,
>>> > Roman.
>>> >
>>> > On Mon, Feb 27, 2017 at 4:10 PM, Frank McQuillan <
>>> fmcquil...@pivotal.io>
>>> > wrote:
>>> > > Roman,
>>> > >
>>> > > Does this apply retro-actively back to initial grant of the code to
>>> > ASF?  Or
>>> > > just from the last release 1.9.1?
>>> > >
>>> > > Frank
>>> > >
>>> > > On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik <
>>> ro...@shaposhnik.org
>>> > >
>>> > > wrote:
>>> > >>
>>> > >> Here's the ultimate resolution on the IP issue:
>>> > >>* we don't do anything with existing (BSD) files even if we edit
>>> them
>>> > >>* every new file we create gets an ASF license header
>>> > >>
>>> > >> More details:
>>> > >>
>>> > >> https://issues.apache.org/jira/browse/LEGAL-293?
>>> > focusedCommentId=15881595&page=com.atlassian.jira.
>>> > plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>>> > >>
>>> > >> Thanks,
>>> > >> Roman.
>>> > >>
>>> > >> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan <
>>> fmcquil...@pivotal.io
>>> > >
>>> > >> wrote:
>>> > >> > Thanks Roman for working on this.
>>> > >> >
>>> > >> > If you feel a final answer will be ready next week, then yes by
>>> all
>>> > >> > means l
>>> > >> > would suggest to the community that we wait and re-spin an RC2
>>> with
>>> > the
>>> > >> > license headers issue resolved.  Seems less overhead and effort
>>> than a
>>> > >> > quick follow on release right after 1.10.  Also, there some
>>> momentum
>>> > >> > going
>>> > >> > with the legal discussion, so let's take advantage of that.
>>> > >> >
>>> > >> > Satoshi (release manager), are you OK pausing the RC2 until we
>>> hear
>>> > back
>>> > >> > from Roman next week?
>>> > >> >
>>> > >> > Thank you,
>>> > >> > Fran

[VOTE] MADlib v1.10-rc2

2017-03-03 Thread Frank McQuillan
Hello MADlib community,

I am sending this email on behalf of the release manager Satoshi Nagayasu <
sn...@uptime.jp> .

We have created a MADlib 1.10 RC-2, with the artifacts below up for a vote.

>From project mentor Roman Shaposhnik we heard the ultimate resolution on
the IP issue:
   * we don't do anything with existing (BSD) files even if we edit them
   * every new file we create gets an ASF license header
   * more details:

https://issues.apache.org/jira/browse/LEGAL-293?focusedCommentId=15881595&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15881595

RC-2 replaces RC-1 with the following changes:

* Multiple: Update license headers per Apache guidance
https://github.com/apache/incubator-madlib/commit/a3863b6c2407eb28ba007f6288d167bf88674e6d

* Build: Fix module sort order for PGXN installation
https://github.com/apache/incubator-madlib/commit/fa80240f72a6551c2ee567d471afa499fd1d1efe

* Update the copyright year.
https://github.com/apache/incubator-madlib/commit/0b8415e7eec5c9ebb83fbf22923c69a99b0056ef

* Build: Add error for missing server includedir
https://github.com/apache/incubator-madlib/commit/b3495c50bf491139ac245a21d97963e81892c610

* Encode categorical: Add distributed_by in Postgresql w/ no-op
https://github.com/apache/incubator-madlib/commit/7055dceb3fbde35bae602ac80d4b70486f015748

* Renamed the top level source directory as suggested:
apache-madlib-src-1.10-incubating

This will be the 4th release for Apache MADlib (incubating).

The main goals of this release are:
* new modules (single source shortest path for graph analytics, encode
categorical variables, K-nearest neighbors)
* improvements to existing modules (add grouping support to elastic
net and PCA, add cross validation to elastic net, array input for
K-means, verbose output option for DT and RF, limit itemset size in
association rules, various madpack installer improvements)
* platform updates (PostgreSQL 9.6)
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10

*** Please download, review and vote by Mon Mar 6, 2017 @ 6pm Pacific Time
USA ***

We're voting upon the source (tag):  rc/1.10.0-rc2
https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc2

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.10.0-incubating-rc2/

Commit to be voted upon:
https://github.com/apache/incubator-madlib/commit/a3863b6c2407eb28ba007f6288d167bf88674e6d

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, can PMC members please be sure to
indicate "(binding)" with their vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Regards,
Frank McQuillan


Re: [VOTE] MADlib v1.10-rc1

2017-02-28 Thread Frank McQuillan
Thanks Rahul.  I see your commit has addressed the remaining issues:
https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=a3863b6c

We are declaring create_indicators.* as new files so they will have Apache
header.

For the record, I attached an Excel spreadsheet with some more notes so
that we remember how we went from the two lists Rahul posted above to the
above commit.

Frank

On Mon, Feb 27, 2017 at 5:44 PM, Rahul Iyer  wrote:

> I have attached two files:
>
> new_files_after_apache.txt: New files added since September 15, 2015
> (grant date) till date
> files_w_apache_header.txt: Files that contain the Apache header right now.
>
> Comparing the two lists, there are open questions regarding below files.
>
> Extra headers:
> - sort-module.py has Apache header but was created before grant (recently
> edited and header added). *I'll fix this*.
> - create_indicators.* have headers but were renamed from
> data_preparation.*. *What is the legal guidance with this*?
>
> No header:
> - class_diagram.mp looks like a text file with no header, even though it
> was added just after the grant. I'm not aware of the purpose of this file.
>
>
>
> On Mon, Feb 27, 2017 at 4:42 PM, Frank McQuillan 
> wrote:
>
>> OK, so we need to go back and do the comparison from the original code
>> grant in the fall of 2015 to the  current 1.10 release candidate.
>>
>> On Mon, Feb 27, 2017 at 4:19 PM, Roman Shaposhnik 
>> wrote:
>>
>> > Frank, I'm not sure I understand the question. The criteria needs to
>> hold
>> > for anything that came in via the initial code ingest compared to how
>> the
>> > master of your project looks now.
>> >
>> > Thanks,
>> > Roman.
>> >
>> > On Mon, Feb 27, 2017 at 4:10 PM, Frank McQuillan > >
>> > wrote:
>> > > Roman,
>> > >
>> > > Does this apply retro-actively back to initial grant of the code to
>> > ASF?  Or
>> > > just from the last release 1.9.1?
>> > >
>> > > Frank
>> > >
>> > > On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik <
>> ro...@shaposhnik.org
>> > >
>> > > wrote:
>> > >>
>> > >> Here's the ultimate resolution on the IP issue:
>> > >>* we don't do anything with existing (BSD) files even if we edit
>> them
>> > >>* every new file we create gets an ASF license header
>> > >>
>> > >> More details:
>> > >>
>> > >> https://issues.apache.org/jira/browse/LEGAL-293?
>> > focusedCommentId=15881595&page=com.atlassian.jira.
>> > plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>> > >>
>> > >> Thanks,
>> > >> Roman.
>> > >>
>> > >> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan <
>> fmcquil...@pivotal.io
>> > >
>> > >> wrote:
>> > >> > Thanks Roman for working on this.
>> > >> >
>> > >> > If you feel a final answer will be ready next week, then yes by all
>> > >> > means l
>> > >> > would suggest to the community that we wait and re-spin an RC2 with
>> > the
>> > >> > license headers issue resolved.  Seems less overhead and effort
>> than a
>> > >> > quick follow on release right after 1.10.  Also, there some
>> momentum
>> > >> > going
>> > >> > with the legal discussion, so let's take advantage of that.
>> > >> >
>> > >> > Satoshi (release manager), are you OK pausing the RC2 until we hear
>> > back
>> > >> > from Roman next week?
>> > >> >
>> > >> > Thank you,
>> > >> > Frank
>> > >> >
>> > >> >
>> > >> > On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik <
>> > ro...@shaposhnik.org>
>> > >> > wrote:
>> > >> >
>> > >> >> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan
>> > >> >> 
>> > >> >> wrote:
>> > >> >> > Agree with Rahul re putting up an RC2 with the suggested changes
>> > from
>> > >> >> Roman,
>> > >> >> > including incorporating Ed's comments on copyright year and top
>> > level
>> > >> >> folder
>> > >> >> > naming.  These are really items but let's respond to the RC1
>> > >> >> > reviewers
>> > >> >> the
>> > >> >> > best way we can.
>> > >> >>
>> > >> >> +1 to a respin.
>> > >> >>
>> > >> >> > Regarding the ASF legal issue being discussed, MADLib community
>> is
>> > >> >> > more
>> > >> >> than
>> > >> >> > happy to respond to any guidance from the fine folks at the ASF
>> > >> >> > around
>> > >> >> > headers with appropriate licensing verbage.  We just need to
>> know
>> > >> >> > what
>> > >> >> that
>> > >> >> > guidance is.
>> > >> >>
>> > >> >> Well, if you're ok respinning next week I hope to get you a final
>> > >> >> answer by then.
>> > >> >> Might as well kill two birds with the same RC. Or we can quickly
>> do a
>> > >> >> follow up
>> > >> >> release once the licensing headers dust settles. Up to you guys.
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Roman.
>> > >> >>
>> > >
>> > >
>> >
>>
>
>


file headers work1.xlsx
Description: MS-Excel 2007 spreadsheet


Re: [VOTE] MADlib v1.10-rc1

2017-02-27 Thread Frank McQuillan
OK, so we need to go back and do the comparison from the original code
grant in the fall of 2015 to the  current 1.10 release candidate.

On Mon, Feb 27, 2017 at 4:19 PM, Roman Shaposhnik 
wrote:

> Frank, I'm not sure I understand the question. The criteria needs to hold
> for anything that came in via the initial code ingest compared to how the
> master of your project looks now.
>
> Thanks,
> Roman.
>
> On Mon, Feb 27, 2017 at 4:10 PM, Frank McQuillan 
> wrote:
> > Roman,
> >
> > Does this apply retro-actively back to initial grant of the code to
> ASF?  Or
> > just from the last release 1.9.1?
> >
> > Frank
> >
> > On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik  >
> > wrote:
> >>
> >> Here's the ultimate resolution on the IP issue:
> >>* we don't do anything with existing (BSD) files even if we edit them
> >>* every new file we create gets an ASF license header
> >>
> >> More details:
> >>
> >> https://issues.apache.org/jira/browse/LEGAL-293?
> focusedCommentId=15881595&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
> >>
> >> Thanks,
> >> Roman.
> >>
> >> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan  >
> >> wrote:
> >> > Thanks Roman for working on this.
> >> >
> >> > If you feel a final answer will be ready next week, then yes by all
> >> > means l
> >> > would suggest to the community that we wait and re-spin an RC2 with
> the
> >> > license headers issue resolved.  Seems less overhead and effort than a
> >> > quick follow on release right after 1.10.  Also, there some momentum
> >> > going
> >> > with the legal discussion, so let's take advantage of that.
> >> >
> >> > Satoshi (release manager), are you OK pausing the RC2 until we hear
> back
> >> > from Roman next week?
> >> >
> >> > Thank you,
> >> > Frank
> >> >
> >> >
> >> > On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik <
> ro...@shaposhnik.org>
> >> > wrote:
> >> >
> >> >> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan
> >> >> 
> >> >> wrote:
> >> >> > Agree with Rahul re putting up an RC2 with the suggested changes
> from
> >> >> Roman,
> >> >> > including incorporating Ed's comments on copyright year and top
> level
> >> >> folder
> >> >> > naming.  These are really items but let's respond to the RC1
> >> >> > reviewers
> >> >> the
> >> >> > best way we can.
> >> >>
> >> >> +1 to a respin.
> >> >>
> >> >> > Regarding the ASF legal issue being discussed, MADLib community is
> >> >> > more
> >> >> than
> >> >> > happy to respond to any guidance from the fine folks at the ASF
> >> >> > around
> >> >> > headers with appropriate licensing verbage.  We just need to know
> >> >> > what
> >> >> that
> >> >> > guidance is.
> >> >>
> >> >> Well, if you're ok respinning next week I hope to get you a final
> >> >> answer by then.
> >> >> Might as well kill two birds with the same RC. Or we can quickly do a
> >> >> follow up
> >> >> release once the licensing headers dust settles. Up to you guys.
> >> >>
> >> >> Thanks,
> >> >> Roman.
> >> >>
> >
> >
>


Re: [VOTE] MADlib v1.10-rc1

2017-02-27 Thread Frank McQuillan
Roman,

Does this apply retro-actively back to initial grant of the code to ASF?
Or just from the last release 1.9.1?

Frank

On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik 
wrote:

> Here's the ultimate resolution on the IP issue:
>* we don't do anything with existing (BSD) files even if we edit them
>* every new file we create gets an ASF license header
>
> More details:
>https://issues.apache.org/jira/browse/LEGAL-293?
> focusedCommentId=15881595&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>
> Thanks,
> Roman.
>
> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan 
> wrote:
> > Thanks Roman for working on this.
> >
> > If you feel a final answer will be ready next week, then yes by all
> means l
> > would suggest to the community that we wait and re-spin an RC2 with the
> > license headers issue resolved.  Seems less overhead and effort than a
> > quick follow on release right after 1.10.  Also, there some momentum
> going
> > with the legal discussion, so let's take advantage of that.
> >
> > Satoshi (release manager), are you OK pausing the RC2 until we hear back
> > from Roman next week?
> >
> > Thank you,
> > Frank
> >
> >
> > On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik 
> > wrote:
> >
> >> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan  >
> >> wrote:
> >> > Agree with Rahul re putting up an RC2 with the suggested changes from
> >> Roman,
> >> > including incorporating Ed's comments on copyright year and top level
> >> folder
> >> > naming.  These are really items but let's respond to the RC1 reviewers
> >> the
> >> > best way we can.
> >>
> >> +1 to a respin.
> >>
> >> > Regarding the ASF legal issue being discussed, MADLib community is
> more
> >> than
> >> > happy to respond to any guidance from the fine folks at the ASF around
> >> > headers with appropriate licensing verbage.  We just need to know what
> >> that
> >> > guidance is.
> >>
> >> Well, if you're ok respinning next week I hope to get you a final
> >> answer by then.
> >> Might as well kill two birds with the same RC. Or we can quickly do a
> >> follow up
> >> release once the licensing headers dust settles. Up to you guys.
> >>
> >> Thanks,
> >> Roman.
> >>
>


Re: [VOTE] MADlib v1.10-rc1

2017-02-21 Thread Frank McQuillan
Thanks Roman for working on this.

If you feel a final answer will be ready next week, then yes by all means l
would suggest to the community that we wait and re-spin an RC2 with the
license headers issue resolved.  Seems less overhead and effort than a
quick follow on release right after 1.10.  Also, there some momentum going
with the legal discussion, so let's take advantage of that.

Satoshi (release manager), are you OK pausing the RC2 until we hear back
from Roman next week?

Thank you,
Frank


On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik 
wrote:

> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan 
> wrote:
> > Agree with Rahul re putting up an RC2 with the suggested changes from
> Roman,
> > including incorporating Ed's comments on copyright year and top level
> folder
> > naming.  These are really items but let's respond to the RC1 reviewers
> the
> > best way we can.
>
> +1 to a respin.
>
> > Regarding the ASF legal issue being discussed, MADLib community is more
> than
> > happy to respond to any guidance from the fine folks at the ASF around
> > headers with appropriate licensing verbage.  We just need to know what
> that
> > guidance is.
>
> Well, if you're ok respinning next week I hope to get you a final
> answer by then.
> Might as well kill two birds with the same RC. Or we can quickly do a
> follow up
> release once the licensing headers dust settles. Up to you guys.
>
> Thanks,
> Roman.
>


Re: [VOTE] MADlib v1.10-rc1

2017-02-21 Thread Frank McQuillan
Agree with Rahul re putting up an RC2 with the suggested changes from
Roman, including incorporating Ed's comments on copyright year and top
level folder naming.  These are really items but let's respond to the RC1
reviewers the best way we can.

Regarding the ASF legal issue being discussed, MADLib community is more
than happy to respond to any guidance from the fine folks at the ASF around
headers with appropriate licensing verbage.  We just need to know what that
guidance is.

Frank


On Tue, Feb 21, 2017 at 10:58 AM, Ed Espino  wrote:

> Orhan,
>
> One more mildly interesting thing I noticed, I can't use the gsha512sum and
> gmd5sum commands to validate the corresponding checksum values easily. Even
> though the commands (see below links) available on the Apache site do
> generate correct checksums, they aren't conducive to a quick validation of
> them.
>
>
>- What Is An MD5 Checksum?
><https://www.apache.org/dev/release-signing.html#md5>
>- What is a SHA checksum?
><https://www.apache.org/dev/release-signing.html#sha-checksum>
>
> For reference, here are the processes and validation steps we use for the
> Apache HAWQ incubator project.  They may be of some use to your project and
> help those validating checksums.
>
>- Create the Release Candidate
><https://cwiki.apache.org/confluence/display/HAWQ/
> Release+Process%3A+Step+by+step+guide#ReleaseProcess:Stepbystepguide-
> CreatetheReleaseCandidate>
>
> Hope this helps,
> -=e
>
> On Thu, Feb 16, 2017 at 12:06 PM, Orhan Kislal  wrote:
>
> > Hi Ed,
> >
> > Thanks for the review. One of the comments from the previous release was
> a
> > preference towards a signature with an Apache id. Since Satoshi-san is
> not
> > an Apache committer yet, I took care of the signing process.
> >
> > Thanks,
> >
> > Orhan Kislal
> >
> > On Thu, Feb 16, 2017 at 11:58 AM, Ed Espino  wrote:
> >
> > > A few MADlib v1.10-rc1 observations from a HAWQ incubator committer.
> > >
> > >- The Copyright year (2016) in the NOTICE file needs to be updated
> to
> > >2017. I believe this can be handled in next release.
> > >- As it still applies, similar to a past comment by Roman ([VOTE]
> > MADlib
> > >v1.9.1-rc2
> > ><https://lists.apache.org/thread.html/
> 981b4c24eaa2ab069b8e18f7aa4bdd
> > > c7a78d3a9dc26bf659af94fcfe@%3Cgeneral.incubator.apache.org%3E>)
> > >- *"* name of the top level folder in the archive is weird. The
> usual
> > >practice is to call the top level folder as - > > ID>*"*
> > > (example: *apache-madlib-src-1.10-incubating* instead of
> > >*incubator-madlib*)
> > >- I'm more curious than anything. Why did Orhan sign the release? I
> > was
> > >expecting the release manager (Satoshi Nagayasu) to have signed the
> > > release.
> > >- Checksums and PGP signature are good.
> > >-  ASF headers check: I spot checked files added (git whatchanged
> > >--diff-filter=A) since the last release. ASF headers look good.
> Nice
> > > Job!
> > >
> > > I was going to try and build but I ran past my allotted time limit for
> > this
> > > review. Hopefully, I can try this soon.
> > >
> > > Regards,
> > > -=ed espino
> > >
> > > On Thu, Feb 16, 2017 at 10:05 AM, Orhan Kislal 
> > wrote:
> > >
> > > > +1
> > > >
> > > > Orhan Kislal
> > > >
> > > > On Thu, Feb 16, 2017 at 9:23 AM, Joe Hellerstein <
> > > hellerst...@berkeley.edu
> > > > >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Sent from a telephone.
> > > > >
> > > > > > On Feb 16, 2017, at 9:17 AM, Frank McQuillan <
> > fmcquil...@pivotal.io>
> > > > > wrote:
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Frank McQuillan
> > > > > >
> > > > > >> On Wed, Feb 15, 2017 at 7:27 PM, Satoshi Nagayasu <
> > sn...@uptime.jp>
> > > > > wrote:
> > > > > >>
> > > > > >> Hello MADlib community,
> > > > > >>
> > > > > >> We have created a MADlib 1.10 RC-1, with the artifacts below up
> > for
> > > a
> > > > > vote.
> > > > > >>
> > > > > >

Reminder to vote on MADlib 1.10 release candidate

2017-02-17 Thread Frank McQuillan
Hello,

Gentle reminder that release manager Satoshi-san put up a MADlib 1.10
release candidate and is asking for a vote before Sat 6 pm Pacific Time.

So please vote.

Here are the user and dev threads:

https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201702.mbox/%3CCAA8sozdFbpqigNMdKbsZQtHft3VvP7%2BOO1dcx9X_qBRZiFVzZA%40mail.gmail.com%3E

https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201702.mbox/%3CCAA8sozdFbpqigNMdKbsZQtHft3VvP7%2BOO1dcx9X_qBRZiFVzZA%40mail.gmail.com%3E

Thanks,
Frank


Re: [VOTE] MADlib v1.10-rc1

2017-02-16 Thread Frank McQuillan
+1

Frank McQuillan

On Wed, Feb 15, 2017 at 7:27 PM, Satoshi Nagayasu  wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.10 RC-1, with the artifacts below up for a vote.
>
> This will be the 4th release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new modules (single source shortest path for graph analytics, encode
> categorical variables, K-nearest neighbors)
> * improvements to existing modules (add grouping support to elastic
> net and PCA, add cross validation to elastic net, array input for
> K-means, verbose output option for DT and RF, limit itemset size in
> association rules, various madpack installer improvements)
> * platform updates (PostgreSQL 9.6)
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
>
> *** Please download, review and vote by Sat Feb 18, 2017 @ 6pm PST ***
>
> We're voting upon the source (tag):  rc/1.10.0-rc1
> https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc1
>
> Source Files:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> 10.0-incubating-rc1/
>
> Commit to be voted upon:
> https://github.com/apache/incubator-madlib/commit/
> ea17530bfe22a1fde173d7fa83508cbcd9924c20
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, can PMC members please be sure to
> indicate "(binding)" with their vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> --
> Satoshi Nagayasu 
>


1.11 feature suggestions

2017-02-15 Thread Frank McQuillan
Release Manager Satoshi-san is putting the final touches on the 1.10 RC,
and he should be sending out an announcement on that shortly for voting by
the community.

While that is happening, I wanted to suggest some ideas for 1.11 .

Based on the recent survey
http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-survey-results-Oct-2016.pdf
graph analytics was identified as a desired area of development for Apache
MADlib.

You can also have a look at my recent talk on FOSDEM17 on this topic
https://fosdem.org/2017/schedule/event/graph_analytics_massively_parallel_processing_databases/

So I have created a bunch of 1.11 JIRAs on graph that I am interested in
pursuing.
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.11%20ORDER%20BY%20priority%20DESC

If you have other things that you are interested in for 1.11, please by all
means open a JIRA or let the community know or start working on the
software.

I would also suggest we look at a shorter release cycle for 1.11, in the
next couple months or so.

A always, open to suggests and comments.

Regards,
Frank


Re: Madlib Feature Improvement Proposal: Update SVD with Improved Eigen Function

2017-02-08 Thread Frank McQuillan
Thanks for the question, Aaron.

MADlib does not use the Eigen SVD very much actually, only in single node
situations, so while moving to a better version is a good idea, it probably
won't materially impact operations on large data sets.

For most cases (e,g., PCA) the distributed version of SVD is used
http://madlib.incubator.apache.org/docs/latest/group__grp__svd.html

It is a custom version which is describe in Chapter 5 of the MADlib design
document:
http://madlib.incubator.apache.org/design.pdf
using Lanczos bidiagonalization.

Now, if we were to make improvements to performance of distributed SVD,
then that would help with large data sets which is our focus.  Perhaps you
have some suggestions on that aspect?

Frank

On Wed, Dec 28, 2016 at 10:17 PM, Aaron Gokaslan 
wrote:

> Hello, this is my time using an email based forum so let me know if there
> is anything else I need to do.
>
> I was reading the most recent survey
>  artifacts/Apache-MADlib-user-survey-results-Oct-2016.pdf>
> results and one of the features I really agreed on is more scalable SVD. I
> happened to look into that issue and found an interesting Stack Overflow
> post
>  library-svd-is-slow-compared-to-gsl>
> about a new SVD algorithm that has just been officially added to the latest
> version of Eigen. According to the documentation
>  the new
> algorithm is much more scalable than the previous one. This would obviously
> bump the requirements of Eigen to the latest version, 3.3.1, but the much
> faster SVD algorithm would be worth it. I am interested in helping out
> implement the feature, but I wanted to have a JIRA issue opened and discuss
> how to best proceed as this is my first time contributing to an Apache
> project.
>
> TLDR: New version of Eigen released with more scalable SVD, I would like to
> see it implemented in Madlib.
>
> Aaron Gokaslan
>


Re: Status of on-going PRs

2017-02-02 Thread Frank McQuillan
Looks like all of the PRs for 1.10 have been merged, except
https://github.com/apache/incubator-madlib/pull/75
which will spill over to 2.0.  Thank you all who contributed on doing this.

We are still doing some final checking on E2E functional test suite across
postres, greenplum and apache hawq, so not officially at code freeze yet.
But getting dangerously close.

Frank

On Tue, Jan 31, 2017 at 12:22 PM, Rahul Iyer  wrote:

> Hi Satoshi,
>
> Thanks for compiling this list. Please find my comments inline.
>
> On Tue, Jan 31, 2017 at 3:04 AM, Satoshi Nagayasu  wrote:
>
> > Hi all,
> >
> > As release manager for 1.10, I just did a quick review and created a
> status
> > list of the on-going PRs.
> >
> > https://github.com/apache/incubator-madlib/pulls
> >
> > If you have comments, please let me know. I will update the status.
> >
> > Status of the PRs
> > -
> > Use relative path for installation in GPDB/HAWQ #94
> >   -> Need to be tested with GPDB/HAWQ.
> >
> > Build: Use only major version for GPDB 5, HAWQ 2 #91
> >   -> Need review?
> >
> ​Testing is complete for both PRs. Requires a review.
> ​
>
> > Allow encode_categorical_variables() to use the svec type. #93
> >   -> Need more work by the developer (me).
> >
> ​This would be better merged within the 1.10 release.
> Adding it to the next version would require special handling by upgrade
> since there is a change in argument type (hence requiring drop/replace
> during upgrade).
>
>
> >  K-means: support for array input #89
> >   -> Need more review, or ready for committer?
> >
> ​This looks ready to merge. ​
>
> >
> > JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc #81
> >   -> Need more work by the developer.
> >
> > HAWQ2.1: Changes the cmake to assume any HAWQ 2.X system is 2.0 and #79
> >   -> Need review, or ready for committer?
> >
> ​This is superseded by #91 and will be closed with it. ​
>
>
> > Include boost::format in MathToolkit_impl.hpp. #76
> >   -> Already merged. The PR can be closed.
> >
> ​I forgot to close this with the commit message and can only be manually
> closed by the contributor. If not closed soon, I'll close it with a future
> commit.
> ​
>
> > SVM: Implement c++ functions for training multi-class svm in mini-batch
> #75
> >   -> The doc needs to be updated?
> >
> ​This requires substantial more work and discussion as the scope of the
> work is not defined. We will have to ​release without it.
>
>
>
> >
> > Regards,
> > --
> > Satoshi Nagayasu 
> >
>


Re: schema "madlib" does not exist error on Mac OS Sierra

2017-02-02 Thread Frank McQuillan
I don't think the mailing list supports attachments. At least I cannot see
them.  Maybe cut and paste in-line.

Frank

On Tue, Jan 31, 2017 at 7:33 PM, Sankara Subramanian,Karthik Maharajan <
skarthikmahar...@ufl.edu> wrote:

> I guess pasting the images did not work. Please find the screenshots
> attached,
>
> MADlib.png —> Shows successful installation of MADlib.
> postgresql.png —> Shows the error I had mentioned. I execute this query
> after creating the table and inserting 20 rows as in this tutorial page -->
> https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+Users
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
> On Jan 31, 2017, at 10:25 PM, Sankara Subramanian,Karthik Maharajan <
> skarthikmahar...@ufl.edu> wrote:
>
> Hi Frank,
> Please find the screenshots below,
>
> This is the error I had mentioned. I execute this query after creating the
> table and inserting 20 rows as in this tutorial page
> -> https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+Users
>
>
>
>
> The next screenshot shows that MADlib has been successfully installed.
>
> Please let me know if you need more details.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
> On Jan 31, 2017, at 2:24 PM, Frank McQuillan 
> wrote:
>
> Karthik,
>
> Please attach the output from the installation so we can have a look.
>
> Thanks,
> Frank
>
> On Mon, Jan 30, 2017 at 5:48 PM, Sankara Subramanian,Karthik Maharajan <
> skarthikmahar...@ufl.edu> wrote:
>
> MADlib community,
> I am using Mac OS Sierra 10.12.2. I have installed both postgresql and
> Madlib as per the “Super Quick Start” instruction in this page ->
> https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide.
> The installations were successful. However, when I try to run the sample
> logistic regression query from the tutorial page,
>
> SELECT madlib.logregr_train(
>'patients', -- source table
>'patients_logregr', -- output table
>'second_attack',-- labels
>'ARRAY[1, treatment, trait_anxiety]',   -- features
>NULL,   -- grouping columns
>20, -- max number of iteration
>'irls'  -- optimizer
>);
>
> I am getting the error,
>
> ERROR:  schema "madlib" does not exist
> LINE 1: SELECT madlib.logregr_train(
>
>
> Should I manually create the schema for madlib? Please let me know what I
> am missing here.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
>
>
>
>


Re: Upgrade support

2017-02-01 Thread Frank McQuillan
Orhan, I think this is a reasonable approach.  Supporting upgrades for
older versions is time consuming and probably not worth the effort at this
point. Pus you have offered a work around.

Frank



On Wed, Feb 1, 2017 at 3:14 PM, Orhan Kislal  wrote:

> Dear MADlib community,
>
> I started working on the upgrade support for our upcoming release (MADlib
> 1.10.0) and made some progress. Historically, MADlib supported upgrades
> from any 1.x version. However, with every version, this task becomes more
> and more time consuming. Note that all upgrades have to be tested for 6
> platforms (last 2 versions Postgres, Greenplum and HAWQ). I believe we can
> drop support for upgrades for versions prior to 1.8 but I wanted to consult
> with you before taking this action. This change will not disable upgrade
> for older versions entirely. The upgrade might not give proper error
> messages but it should still work if there are no dependencies. In
> addition, it is possible to follow an upgrade chain 1.x -> 1.9.1 -> 1.10.0.
>
> Please let us know if this change is not reasonable.
>
> Thanks
>
> Orhan Kislal
>


Re: schema "madlib" does not exist error on Mac OS Sierra

2017-01-31 Thread Frank McQuillan
Karthik,

Please attach the output from the installation so we can have a look.

Thanks,
Frank

On Mon, Jan 30, 2017 at 5:48 PM, Sankara Subramanian,Karthik Maharajan <
skarthikmahar...@ufl.edu> wrote:

> MADlib community,
> I am using Mac OS Sierra 10.12.2. I have installed both postgresql and
> Madlib as per the “Super Quick Start” instruction in this page ->
> https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide.
> The installations were successful. However, when I try to run the sample
> logistic regression query from the tutorial page,
>
> SELECT madlib.logregr_train(
> 'patients', -- source table
> 'patients_logregr', -- output table
> 'second_attack',-- labels
> 'ARRAY[1, treatment, trait_anxiety]',   -- features
> NULL,   -- grouping columns
> 20, -- max number of iteration
> 'irls'  -- optimizer
> );
>
> I am getting the error,
>
> ERROR:  schema "madlib" does not exist
> LINE 1: SELECT madlib.logregr_train(
>
>
> Should I manually create the schema for madlib? Please let me know what I
> am missing here.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
>


1.10 release status and release manager

2017-01-27 Thread Frank McQuillan
MADlib community,

We are getting fairly close to completing the software for the 1.10 release
and putting up an RC.

The PR list is getting smaller as we review and complete testing
https://github.com/apache/incubator-madlib/pulls

Satoshi Nagayasu
satoshi.nagay...@gmail.com
https://github.com/snaga
has graciously offered to be the release manager for 1.10.  Thank you very
much Satoshi for your help!

Regards,
Frank


DRAFT Apache MADlib (incubating) podling report for Q416

2017-01-04 Thread Frank McQuillan
Here is the draft report for Jan 2017, covering Q4 activity.

It is posted at http://wiki.apache.org/incubator/January2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Need guidance from Incubator PMC on how to resolve the BSD licensing
switch over to Apache License.  What should be the content of the license
headers for files that were previously BSD licensed and then granted to
ASF?  Related legal-discuss threads:
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/%3ccalgg8z03zhhbfegxoi4fh+vxtf+9m7x6hak9rjkqjapuzi6...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201603.mbox/%3C9D1AF43C-370B-4E58-B0EF-2E29D242F50B%40jaguNET.com%3E
  2. Continue to produce regular Apache (incubating) releases.
  3. Continue to execute and manage the project according to governance
model of the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 1. Yes-please see #1 above and provide guidance.
 2. The next release v1.10 will be the 4th as an incubating project.  After
that, the community would ideally like to move towards top level status.

How has the community developed since the last report?

  1. Some related events in Q4 2016 and upcoming:
* Feb 4, 2017 - Presentation accepted at FOSDEM’17 Graph devroom.
Topic:  Graph Analytics on Massively Parallel Processing Databases (Frank
McQuillan)
* Dec 1, 2016 - MADLib community call.  Topic:  New features in R interface
and MADlib user survey results (hosted by Greg Chase, Orhan Kislal, Frank
McQuillan)
* Nov 16, 2016 - Presentation at PGConf Silicon Valley.  Topic:
 Distributed In-Database Machine Learning with Apache MADlib (incubating)
(Frank McQuillan)
* Nov 14, 2016 - Presentation at Apache Big Data Europe.  Topic:
 Distributed In-Database Machine Learning with Apache MADlib (incubating)
(Roman Shaposhnik)
  2. Material technical conversations on user/dev mailing lists and in the
appropriate JIRAs and pull requests.
  3. New contributors to the project have been working on KNN module and
Python interface.

How has the project developed since the last report?

  1. Active work in progress for 4th ASF release MADlib v10 scheduled for
Jan 2017.  Features include: single source shortest path graph algorithm,
completely new module for encoding categorical variables, R interface
update, grouping support in elastic net and PCA, cross validation in
elastic net, verbose output option for decision tree visualization.
  2. Mailing list activity in Q4:  227 postings to dev, 66 postings to user.

Date of last release:

  MADlib v1.9.1 on 9/19/16.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Podcast on machine learning in enterprise

2016-12-14 Thread Frank McQuillan
Here's a podcast I did recently with Jeff Kelly of Pivotal, talking about
machine learning in enterprise, which is where Apache MADlib can play a
meaningful role.

https://blog.pivotal.io/pivotal-insights/features/12-no-machine-learning-wont-lead-to-killer-robots

Frank


Re: New tweeter for MADlib

2016-12-13 Thread Frank McQuillan
https://twitter.com/ApacheMADlib


On Tue, Dec 13, 2016 at 4:22 AM, Luis Macedo  wrote:

> Hey guys!
>
> What is the twitter handle for MADlib? The bet the one I found is not about
> this MADlib!
>
> https://twitter.com/madlib
>
>
> Thanks!
>
>
> *Luis Macedo | Sr Platform Architect | **Pivotal Inc *
>
> *Mobile:* +55 11 97616-6438
> *Pivotal.io <http://pivotal.io>*
> *Take care of the customers and the rest takes care of itself*
>
> 2016-12-12 22:15 GMT-02:00 Frank McQuillan :
>
> > Thanks Bob.  Yes please, tweet away!
> >
> > Frank
> >
> > On Mon, Dec 12, 2016 at 8:56 AM, Greg Chase  wrote:
> >
> > > +1
> > >
> > > This email encrypted by tiny buttons & fat thumbs, beta voice
> > recognition,
> > > and autocorrect on my iPhone.
> > >
> > > > On Dec 11, 2016, at 9:16 PM, Bob Glithero 
> > wrote:
> > > >
> > > > Hello MADlib community,
> > > >
> > > > I'm a newish member of Pivotal responsible for product marketing for
> > > > HDB/HAWQ, and will be spending more time on awareness for MADlib.  If
> > > > there's no objection, I'd like to add myself as a tweeter on behalf
> of
> > > > MADlib.
> > > >
> > > > Thanks!
> > > > Bob Glithero
> > > > Pivotal, Inc.
> > >
> >
>


Re: New tweeter for MADlib

2016-12-12 Thread Frank McQuillan
Thanks Bob.  Yes please, tweet away!

Frank

On Mon, Dec 12, 2016 at 8:56 AM, Greg Chase  wrote:

> +1
>
> This email encrypted by tiny buttons & fat thumbs, beta voice recognition,
> and autocorrect on my iPhone.
>
> > On Dec 11, 2016, at 9:16 PM, Bob Glithero  wrote:
> >
> > Hello MADlib community,
> >
> > I'm a newish member of Pivotal responsible for product marketing for
> > HDB/HAWQ, and will be spending more time on awareness for MADlib.  If
> > there's no objection, I'd like to add myself as a tweeter on behalf of
> > MADlib.
> >
> > Thanks!
> > Bob Glithero
> > Pivotal, Inc.
>


New PCA video posted

2016-12-09 Thread Frank McQuillan
Hi,

The latest MADlib video on Principal Component Analysis (PCA) has been
published on Youtube.com under the Pivotal Open Source Hub channel.  The
link to the video is:

https://www.youtube.com/watch?v=2R-76gimBX4&t=43s

Thank you to Charles Killam for putting this video together.

Frank


Re: Reminder: [VIRTUAL] MADlib Community Call: Pivotal R for MADlib & MADlib user survey results - Thurs, Dec 1, 2016

2016-12-01 Thread Frank McQuillan
Oh, and here is the PivotalR demo that Orhan showed:
https://github.com/apache/incubator-madlib-site/blob/asf-site/community-artifacts/PivotaR-demo-nov-2016.R

Frank

On Thu, Dec 1, 2016 at 10:12 AM, Frank McQuillan 
wrote:

> The MADlib user survey results that I went over are posted here
> http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-
> survey-results-Oct-2016.pdf
>
> Thank you for attending.
>
> Frank
>
> On Thu, Dec 1, 2016 at 8:08 AM, Gregory Chase  wrote:
>
>> The MADlib Community call discussing Pivotal R for Greenplum, HAWQ, and
>> PostgreSQL starts in less than 1 hour.
>>
>> See you at 9AM, Pacific.
>>
>> Join the call <https://pivotal.zoom.us/j/248236262>
>>
>> On Wed, Nov 30, 2016 at 2:40 PM, Gregory Chase  wrote:
>>
>> > Greetings,
>> > This is a reminder about tomorrow's MADlib community call at 9AM
>> Pacific.
>> >
>> > Add to calendar
>> > <https://www.google.com/calendar/event?eid=dXJnbXI3YjBnaTRmO
>> DdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2N
>> XRzMnY5Y0Bn&ctz=America/Los_Angeles>
>> >  | Join the call <https://pivotal.zoom.us/j/248236262>
>> >
>> > See you tomorrow!
>> >
>> > -Greg
>> >
>> > On Mon, Nov 28, 2016 at 2:39 PM, Gregory Chase 
>> wrote:
>> >
>> >> Dear MADlib, HAWQ, and Greenplum communities,
>> >>
>> >> Here's a chance for us to get to know our end users better with this
>> >> double header call this Thursday, Dec 1, 2016.
>> >>
>> >> Add to calendar
>> >> <https://www.google.com/calendar/event?eid=dXJnbXI3YjBnaTRmO
>> DdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2N
>> XRzMnY5Y0Bn&ctz=America/Los_Angeles>
>> >> | Join the call <https://pivotal.zoom.us/j/248236262>
>> >>
>> >> The first half of this call, we'll be discussing new features in the
>> open
>> >> source Pivotal R project that compliments MADlib, HAWQ, and
>> Greenplum.  In
>> >> the second half, we'll talk about a recent survey of MADlib users.
>> >>
>> >> *Talk #1: What's New in Pivotal R*
>> >> Pivotal R is a popular interface for running data science
>> investigations
>> >> using Apache MADlib with Greenplum Database, Apache HAWQ, and
>> PostgreSQL.
>> >>
>> >> It allows R developers to work with relational database structures such
>> >> as tables and views and operate on data in the database without having
>> to
>> >> switch to SQL. Pivotal R also provides a wrapper for Apache MADlib so
>> that
>> >> data scientists can directly call the parallel processing functions of
>> >> MADlib. This gives them the full power of in-database processing in
>> their
>> >> familiar R environment.
>> >>
>> >> You can find Pivotal R here: https://cran.r-project.o
>> >> rg/web/packages/PivotalR/index.html
>> >>
>> >> *Talk #2: Apache MADlib User Survey Results*
>> >> In the second half of this call, we'll be discussing the results of a
>> >> recent user survey of Apache MADlib users. Hear which platform they
>> like
>> >> best: Greenplum, HAWQ, or PostgreSQL. Hear which use cases are the most
>> >> popular.
>> >>
>> >> This survey was recently lauded at ApacheCon EU as a great example of
>> >> user experience research in open source.
>> >>
>> >> You can find the results posted here: http://madlib.incubator.
>> >> apache.org/community-artifacts/Apache-MADlib-user-survey-
>> >> results-Oct-2016.pdf
>> >>
>> >> See you Thursday!
>> >>
>> >> -Greg
>> >>
>> >> --
>> >> Greg Chase
>> >>
>> >> Global Head, Big Data Communities
>> >> http://www.pivotal.io/big-data
>> >>
>> >> Pivotal Software
>> >> http://www.pivotal.io/
>> >>
>> >> 650-215-0477
>> >> @GregChase
>> >> Blog: http://geekmarketing.biz/
>> >>
>> >>
>> >
>> >
>> > --
>> > Greg Chase
>> >
>> > Global Head, Big Data Communities
>> > http://www.pivotal.io/big-data
>> >
>> > Pivotal Software
>> > http://www.pivotal.io/
>> >
>> > 650-215-0477
>> > @GregChase
>> > Blog: http://geekmarketing.biz/
>> >
>> >
>>
>>
>> --
>> Greg Chase
>>
>> Global Head, Big Data Communities
>> http://www.pivotal.io/big-data
>>
>> Pivotal Software
>> http://www.pivotal.io/
>>
>> 650-215-0477
>> @GregChase
>> Blog: http://geekmarketing.biz/
>>
>
>


Re: Reminder: [VIRTUAL] MADlib Community Call: Pivotal R for MADlib & MADlib user survey results - Thurs, Dec 1, 2016

2016-12-01 Thread Frank McQuillan
The MADlib user survey results that I went over are posted here
http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-survey-results-Oct-2016.pdf

Thank you for attending.

Frank

On Thu, Dec 1, 2016 at 8:08 AM, Gregory Chase  wrote:

> The MADlib Community call discussing Pivotal R for Greenplum, HAWQ, and
> PostgreSQL starts in less than 1 hour.
>
> See you at 9AM, Pacific.
>
> Join the call 
>
> On Wed, Nov 30, 2016 at 2:40 PM, Gregory Chase  wrote:
>
> > Greetings,
> > This is a reminder about tomorrow's MADlib community call at 9AM Pacific.
> >
> > Add to calendar
> >  dXJnbXI3YjBnaTRmODdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVh
> aGprYm9oMWduZmh2NXRzMnY5Y0Bn&ctz=America/Los_Angeles>
> >  | Join the call 
> >
> > See you tomorrow!
> >
> > -Greg
> >
> > On Mon, Nov 28, 2016 at 2:39 PM, Gregory Chase 
> wrote:
> >
> >> Dear MADlib, HAWQ, and Greenplum communities,
> >>
> >> Here's a chance for us to get to know our end users better with this
> >> double header call this Thursday, Dec 1, 2016.
> >>
> >> Add to calendar
> >>  dXJnbXI3YjBnaTRmODdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVh
> aGprYm9oMWduZmh2NXRzMnY5Y0Bn&ctz=America/Los_Angeles>
> >> | Join the call 
> >>
> >> The first half of this call, we'll be discussing new features in the
> open
> >> source Pivotal R project that compliments MADlib, HAWQ, and Greenplum.
> In
> >> the second half, we'll talk about a recent survey of MADlib users.
> >>
> >> *Talk #1: What's New in Pivotal R*
> >> Pivotal R is a popular interface for running data science investigations
> >> using Apache MADlib with Greenplum Database, Apache HAWQ, and
> PostgreSQL.
> >>
> >> It allows R developers to work with relational database structures such
> >> as tables and views and operate on data in the database without having
> to
> >> switch to SQL. Pivotal R also provides a wrapper for Apache MADlib so
> that
> >> data scientists can directly call the parallel processing functions of
> >> MADlib. This gives them the full power of in-database processing in
> their
> >> familiar R environment.
> >>
> >> You can find Pivotal R here: https://cran.r-project.o
> >> rg/web/packages/PivotalR/index.html
> >>
> >> *Talk #2: Apache MADlib User Survey Results*
> >> In the second half of this call, we'll be discussing the results of a
> >> recent user survey of Apache MADlib users. Hear which platform they like
> >> best: Greenplum, HAWQ, or PostgreSQL. Hear which use cases are the most
> >> popular.
> >>
> >> This survey was recently lauded at ApacheCon EU as a great example of
> >> user experience research in open source.
> >>
> >> You can find the results posted here: http://madlib.incubator.
> >> apache.org/community-artifacts/Apache-MADlib-user-survey-
> >> results-Oct-2016.pdf
> >>
> >> See you Thursday!
> >>
> >> -Greg
> >>
> >> --
> >> Greg Chase
> >>
> >> Global Head, Big Data Communities
> >> http://www.pivotal.io/big-data
> >>
> >> Pivotal Software
> >> http://www.pivotal.io/
> >>
> >> 650-215-0477
> >> @GregChase
> >> Blog: http://geekmarketing.biz/
> >>
> >>
> >
> >
> > --
> > Greg Chase
> >
> > Global Head, Big Data Communities
> > http://www.pivotal.io/big-data
> >
> > Pivotal Software
> > http://www.pivotal.io/
> >
> > 650-215-0477
> > @GregChase
> > Blog: http://geekmarketing.biz/
> >
> >
>
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>


Re: FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon

2016-11-23 Thread Frank McQuillan
I attended FOSDEM last year and can attest to this being a really great
conference for open source developers.

Frank

On Wed, Nov 23, 2016 at 1:11 PM, Roman Shaposhnik 
wrote:

> Hi!
>
> apologies for the extra wide distribution (this exhausts my once
> a year ASF mail-to-all-bigdata-projects quota ;-)) but I wanted
> to suggest that all of you should consider submitting talks
> to FOSDEM 2017 HPC, Bigdata and Data Science DevRoom:
> https://hpc-bigdata-fosdem17.github.io/
>
> It was a great success this year and we hope to make it an even
> bigger success in 2017.
>
> Besides -- FOSDEM is the biggest gathering of open source
> developers on the face of the earth -- don't miss it!
>
> Thanks,
> Roman.
>
> P.S. If you have any questions -- please email me directly and
> see you all in Brussels!
>


Re: Adding KNN to madlib

2016-11-15 Thread Frank McQuillan
Auon,

Thanks for working on kNN for MADlib.   Can you expand a little bit on your
note, and post the interface that you are thinking about and description of
the arguments?  Then people can comment on that.

Thanks,
Frank

On Tue, Nov 15, 2016 at 9:30 AM, Nandish Jayaram 
wrote:

> Hi Auon,
>
> Great going with your first version of k-NN implementation.
> Some useful links for coding guidelines are at (see Developer
> Documentation):
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606
> MADilb has something called as install-checks for basic testing. You can
> look at any existing module for an example of the same. For instance, check
> out the install check code for k-means at:
> https://github.com/apache/incubator-madlib/tree/master/
> src/ports/postgres/modules/kmeans/test
>
> I am sure others will pitch in to help you more with your other questions,
> but these are some starters you can consider! Good luck!
>
> NJ
>
> On Mon, Nov 14, 2016 at 10:41 PM, Kazmi,Auon H  wrote:
>
> > Hi,
> >
> > I am a first year Computer Science graduate student at University of
> > Florida working on implementing KNN in Madlib. I am ready with a first
> > version of it but I don't know how to proceed with testing and adding it
> to
> > Madlib platform. Also, I am not clear on what standards do I have to
> choose
> > in the final implementation. My current version asks for the table name
> and
> > column name having vectors in which I have to find the neighbours. The
> > other table given as input holds the vector whose K-NN needs to be found.
> > It is assuming euclidean distance metric for distance calculation. It
> would
> > really help if somebody can share ideas on what can be added to this
> > functionality.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Auon Haidar Kazmi
> >
>


Re: Encoding categorical variables

2016-11-08 Thread Frank McQuillan
Here is the JIRA with attached requirements doc.
https://issues.apache.org/jira/browse/MADLIB-1038

Please put your comments in the JIRA.  There are still some outstanding
questions to be puzzled out.

Frank

On Fri, Oct 28, 2016 at 3:04 PM, Frank McQuillan 
wrote:

> Yes thanks Vatsan we have been looking at that.
>
> On Fri, Oct 28, 2016 at 2:39 PM, Srivatsan R  wrote:
>
>> You guys may have already seen this, but linking just in case:
>> http://pandas.pydata.org/pandas-docs/stable/generated/pandas
>> .get_dummies.html
>>
>> On Fri, Oct 28, 2016 at 1:32 PM, Woo Jae Jung  wrote:
>>
>> > +Vatsan for his thoughts as well!
>> >
>> > On Fri, Oct 28, 2016 at 1:29 PM, Woo Jae Jung  wrote:
>> >
>> >> Also agree that double-quoted column names are not ideal.  In addition
>> to
>> >> the net-new features described in this thread, it'd be nice to see
>> >> non-double-quoted output as default behavior in the
>> >> existing create_indicator_variables() function.
>> >>
>> >> Thanks,
>> >> Woo
>> >>
>> >> On Fri, Oct 28, 2016 at 1:05 PM, Woo Jae Jung 
>> wrote:
>> >>
>> >>> I like the one-hot encoded feature.  Another variant of this idea
>> would
>> >>> be an "all other" variable (distinct from the reference class) that
>> >>> contains occurrences of the less frequent category types.  In both of
>> these
>> >>> scenarios, the threshold for 'less frequent' could be user-supplied.
>> >>>
>> >>> Thanks,
>> >>> Woo
>> >>>
>> >>> On Fri, Oct 28, 2016 at 11:29 AM, Rahul Iyer 
>> >>> wrote:
>> >>>
>> >>>> An alternative to dropping is to assign the less frequent values to
>> the
>> >>>> reference i.e. all one-hot encoded features will be 0.
>> >>>> Also important to note: total runtime will increase with this option
>> >>>> since
>> >>>> we'll have to compute the exact frequency distribution.
>> >>>>
>> >>>> Another suggested change is to call this function 'one_hot_encoding'
>> >>>> since
>> >>>> that is the output here (similar to sklearn's OneHotEncoder
>> >>>> <http://scikit-learn.org/stable/modules/generated/sklearn.pr
>> >>>> eprocessing.OneHotEncoder.html>).
>> >>>> We can keep the current name as a deprecated alias till 2.0 is
>> released.
>> >>>>
>> >>>> On Fri, Oct 28, 2016 at 11:17 AM, Frank McQuillan <
>> >>>> fmcquil...@pivotal.io>
>> >>>> wrote:
>> >>>>
>> >>>> > Jarrod,
>> >>>> >
>> >>>> > Just trying to write up detailed requirements.  How would you see
>> >>>> this one
>> >>>> > working?
>> >>>> >
>> >>>> > "2) Option to dummy code only the top n most frequently occurring
>> >>>> values in
>> >>>> > any column"
>> >>>> >
>> >>>> > With 1 column I can picture it, you would drop the rows with the
>> less
>> >>>> > frequently occurring values and end up with a smaller table.  But
>> >>>> what if
>> >>>> > you are encoding multiple rows?Would you want a per row
>> >>>> specification
>> >>>> > of n? i.e., top 3 values for column x, top 10 values for column y?
>> >>>> If you
>> >>>> > did this then your result set might include low frequency values
>> for
>> >>>> column
>> >>>> > x (not in top 3) because they are in the top 10 for column y - this
>> >>>> might
>> >>>> > be confusing.
>> >>>> >
>> >>>> > Frank
>> >>>> >
>> >>>> > On Wed, Oct 19, 2016 at 2:44 PM, Frank McQuillan <
>> >>>> fmcquil...@pivotal.io>
>> >>>> > wrote:
>> >>>> >
>> >>>> >> great, thanks for the additional information
>> >>>> >>
>> >>>> >> Frank
>> >>>> >>
>> >>>> >> On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <
>> jvawd...@pivotal.io
>> >>

Apache MADlib user survey results

2016-11-07 Thread Frank McQuillan
We recently ran a survey asking MADlib users about a wide range of topics
pertaining to this open source project, including desired new features.
Thank you to all who responded.

You are welcome to view the survey results:
http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-survey-results-Oct-2016.pdf
and make any comments or suggestions.

Quick summary:

* Received ~40 responses from 27 different companies
* ~50% of respondents have 1 year or less of MADlib use
* Fraud detection is the most common use case
* Regression (various), clustering and random forest are the most commonly
used MADlib algorithms
* Gradient boosting is the most commonly requested new algorithm
* Users prefer new algorithms more than improvements to existing algorithms
by a 2:1 margin
* Improved documentation/examples and better performance are the biggest
concerns
* The most common other tools used by respondents are R, Spark and Python
(and associated libraries)

Frank


Re: Encoding categorical variables

2016-10-28 Thread Frank McQuillan
Yes thanks Vatsan we have been looking at that.

On Fri, Oct 28, 2016 at 2:39 PM, Srivatsan R  wrote:

> You guys may have already seen this, but linking just in case:
> http://pandas.pydata.org/pandas-docs/stable/generated/
> pandas.get_dummies.html
>
> On Fri, Oct 28, 2016 at 1:32 PM, Woo Jae Jung  wrote:
>
> > +Vatsan for his thoughts as well!
> >
> > On Fri, Oct 28, 2016 at 1:29 PM, Woo Jae Jung  wrote:
> >
> >> Also agree that double-quoted column names are not ideal.  In addition
> to
> >> the net-new features described in this thread, it'd be nice to see
> >> non-double-quoted output as default behavior in the
> >> existing create_indicator_variables() function.
> >>
> >> Thanks,
> >> Woo
> >>
> >> On Fri, Oct 28, 2016 at 1:05 PM, Woo Jae Jung  wrote:
> >>
> >>> I like the one-hot encoded feature.  Another variant of this idea would
> >>> be an "all other" variable (distinct from the reference class) that
> >>> contains occurrences of the less frequent category types.  In both of
> these
> >>> scenarios, the threshold for 'less frequent' could be user-supplied.
> >>>
> >>> Thanks,
> >>> Woo
> >>>
> >>> On Fri, Oct 28, 2016 at 11:29 AM, Rahul Iyer 
> >>> wrote:
> >>>
> >>>> An alternative to dropping is to assign the less frequent values to
> the
> >>>> reference i.e. all one-hot encoded features will be 0.
> >>>> Also important to note: total runtime will increase with this option
> >>>> since
> >>>> we'll have to compute the exact frequency distribution.
> >>>>
> >>>> Another suggested change is to call this function 'one_hot_encoding'
> >>>> since
> >>>> that is the output here (similar to sklearn's OneHotEncoder
> >>>> <http://scikit-learn.org/stable/modules/generated/sklearn.pr
> >>>> eprocessing.OneHotEncoder.html>).
> >>>> We can keep the current name as a deprecated alias till 2.0 is
> released.
> >>>>
> >>>> On Fri, Oct 28, 2016 at 11:17 AM, Frank McQuillan <
> >>>> fmcquil...@pivotal.io>
> >>>> wrote:
> >>>>
> >>>> > Jarrod,
> >>>> >
> >>>> > Just trying to write up detailed requirements.  How would you see
> >>>> this one
> >>>> > working?
> >>>> >
> >>>> > "2) Option to dummy code only the top n most frequently occurring
> >>>> values in
> >>>> > any column"
> >>>> >
> >>>> > With 1 column I can picture it, you would drop the rows with the
> less
> >>>> > frequently occurring values and end up with a smaller table.  But
> >>>> what if
> >>>> > you are encoding multiple rows?Would you want a per row
> >>>> specification
> >>>> > of n? i.e., top 3 values for column x, top 10 values for column y?
> >>>> If you
> >>>> > did this then your result set might include low frequency values for
> >>>> column
> >>>> > x (not in top 3) because they are in the top 10 for column y - this
> >>>> might
> >>>> > be confusing.
> >>>> >
> >>>> > Frank
> >>>> >
> >>>> > On Wed, Oct 19, 2016 at 2:44 PM, Frank McQuillan <
> >>>> fmcquil...@pivotal.io>
> >>>> > wrote:
> >>>> >
> >>>> >> great, thanks for the additional information
> >>>> >>
> >>>> >> Frank
> >>>> >>
> >>>> >> On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <
> jvawd...@pivotal.io
> >>>> >
> >>>> >> wrote:
> >>>> >>
> >>>> >>> IMO
> >>>> >>>
> >>>> >>> 1) Option to define resulting column names. Please see pdltools
> >>>> >>> implementation - the ability to pass in a function is especially
> >>>> useful (
> >>>> >>> http://pivotalsoftware.github.io/PDLTools/group__grp__
> pivot01.html)
> >>>> >>> 2) Option to dummy code only the top n most frequently occurring
> >>>> values
> >>>> >>> 

Re: Encoding categorical variables

2016-10-28 Thread Frank McQuillan
Jarrod,

Just trying to write up detailed requirements.  How would you see this one
working?

"2) Option to dummy code only the top n most frequently occurring values in
any column"

With 1 column I can picture it, you would drop the rows with the less
frequently occurring values and end up with a smaller table.  But what if
you are encoding multiple rows?Would you want a per row specification
of n? i.e., top 3 values for column x, top 10 values for column y?  If you
did this then your result set might include low frequency values for column
x (not in top 3) because they are in the top 10 for column y - this might
be confusing.

Frank

On Wed, Oct 19, 2016 at 2:44 PM, Frank McQuillan 
wrote:

> great, thanks for the additional information
>
> Frank
>
> On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey 
> wrote:
>
>> IMO
>>
>> 1) Option to define resulting column names. Please see pdltools
>> implementation - the ability to pass in a function is especially useful (
>> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
>> 2) Option to dummy code only the top n most frequently occurring values in
>> any column
>> 3) Option to create numeric column names (E.g. pivotcol_val1,
>> pivotcol_val2
>> ...) instead of values in column names + secondary mapping table
>> 4) Option to exclude original column from results table
>>
>> (1) & (2) are much higher priority than (3) & (4).
>>
>> Agreed that these could also be applied to Pivoting (especially 1).
>>
>>
>>
>> Jarrod Vawdrey
>> Sr. Data Scientist
>> Data Science & Engineering | Pivotal
>> (650) 315-8905
>> https://pivotal.io/
>>
>> On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan 
>> wrote:
>>
>> > Thanks for those suggestions, Jarrod.  They all sound pretty useful -
>> > would you mind taking a crack at numbering them 1,2,3... etc, in the
>> order
>> > of priority as you see it?
>> >
>> > Also it seems like some of these could be applied to the Pivot function
>> as
>> > well, e.g., UDF for column naming.
>> >
>> > Frank
>> >
>> >
>> >
>> > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey 
>> > wrote:
>> >
>> >> Hey Frank,
>> >>
>> >> How are special character values handled today? It is often not ideal
>> to
>> >> end up with column names that require double quotes to call due to
>> >> downstream scripts.
>> >>
>> >> A couple of features that would be useful
>> >>
>> >> * Option to define resulting column names. Please see pdltools
>> >> implementation - the ability to pass in a function is especially
>> useful (
>> >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
>> >> * Option to dummy code only the top n most frequently occurring values
>> in
>> >> any column
>> >> * Option to exclude original column from results table
>> >> * Option to create numeric column names (E.g. pivotcol_val1,
>> >> pivotcol_val2 ...) instead of values in column names + secondary
>> mapping
>> >> table
>> >>
>> >> Thank you
>> >>
>> >> Jarrod Vawdrey
>> >> Sr. Data Scientist
>> >> Data Science & Engineering | Pivotal
>> >> (650) 315-8905
>> >> https://pivotal.io/
>> >>
>> >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <
>> fmcquil...@pivotal.io>
>> >> wrote:
>> >>
>> >>> For the module encoding categorical variables
>> >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d
>> >>> ata__prep.html
>> >>> does anyone have any suggestions on improvements that we could make?
>> >>>
>> >>> Here is a video on how encoding categorical variables works for those
>> not
>> >>> familiar with it
>> >>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6
>> >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
>> >>>
>> >>
>> >>
>> >
>>
>
>


Re: Proposed improvement to association rules (Apriori) algorithm

2016-10-27 Thread Frank McQuillan
I created a JIRA on this
https://issues.apache.org/jira/browse/MADLIB-1031



On Thu, Oct 27, 2016 at 3:00 PM, Frank McQuillan 
wrote:

> Here is a comment from a MADlib user that I recently heard:
>
> “No apparent way to set an upper bound for itemset size in assoc_rules
> function. This results in it running forever with larger data sets. In the
> R "arules" package, you can set a max itemset size so that it doesn't look
> for unnecessarily large associations.”
> https://cran.r-project.org/web/packages/arules/arules.pdf
>
> Does a single optional parameter make sense to add to
> http://madlib.incubator.apache.org/docs/latest/group__
> grp__assoc__rules.html
> similar to the maxlen parameter in “arules” ?
>
> Any other considerations here or improvements to make the this algorithm
> at the same time? minlen?
>
> Thanks,
> Frank
>
>
>
>
>
>


Proposed improvement to association rules (Apriori) algorithm

2016-10-27 Thread Frank McQuillan
Here is a comment from a MADlib user that I recently heard:

“No apparent way to set an upper bound for itemset size in assoc_rules
function. This results in it running forever with larger data sets. In the
R "arules" package, you can set a max itemset size so that it doesn't look
for unnecessarily large associations.”
https://cran.r-project.org/web/packages/arules/arules.pdf

Does a single optional parameter make sense to add to
http://madlib.incubator.apache.org/docs/latest/group__grp__assoc__rules.html
similar to the maxlen parameter in “arules” ?

Any other considerations here or improvements to make the this algorithm at
the same time? minlen?

Thanks,
Frank


Re: Encoding categorical variables

2016-10-19 Thread Frank McQuillan
great, thanks for the additional information

Frank

On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey  wrote:

> IMO
>
> 1) Option to define resulting column names. Please see pdltools
> implementation - the ability to pass in a function is especially useful (
> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> 2) Option to dummy code only the top n most frequently occurring values in
> any column
> 3) Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2
> ...) instead of values in column names + secondary mapping table
> 4) Option to exclude original column from results table
>
> (1) & (2) are much higher priority than (3) & (4).
>
> Agreed that these could also be applied to Pivoting (especially 1).
>
>
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal
> (650) 315-8905
> https://pivotal.io/
>
> On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan 
> wrote:
>
> > Thanks for those suggestions, Jarrod.  They all sound pretty useful -
> > would you mind taking a crack at numbering them 1,2,3... etc, in the
> order
> > of priority as you see it?
> >
> > Also it seems like some of these could be applied to the Pivot function
> as
> > well, e.g., UDF for column naming.
> >
> > Frank
> >
> >
> >
> > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey 
> > wrote:
> >
> >> Hey Frank,
> >>
> >> How are special character values handled today? It is often not ideal to
> >> end up with column names that require double quotes to call due to
> >> downstream scripts.
> >>
> >> A couple of features that would be useful
> >>
> >> * Option to define resulting column names. Please see pdltools
> >> implementation - the ability to pass in a function is especially useful
> (
> >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> >> * Option to dummy code only the top n most frequently occurring values
> in
> >> any column
> >> * Option to exclude original column from results table
> >> * Option to create numeric column names (E.g. pivotcol_val1,
> >> pivotcol_val2 ...) instead of values in column names + secondary mapping
> >> table
> >>
> >> Thank you
> >>
> >> Jarrod Vawdrey
> >> Sr. Data Scientist
> >> Data Science & Engineering | Pivotal
> >> (650) 315-8905
> >> https://pivotal.io/
> >>
> >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan  >
> >> wrote:
> >>
> >>> For the module encoding categorical variables
> >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d
> >>> ata__prep.html
> >>> does anyone have any suggestions on improvements that we could make?
> >>>
> >>> Here is a video on how encoding categorical variables works for those
> not
> >>> familiar with it
> >>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6
> >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
> >>>
> >>
> >>
> >
>


Re: Encoding categorical variables

2016-10-19 Thread Frank McQuillan
Thanks for those suggestions, Jarrod.  They all sound pretty useful - would
you mind taking a crack at numbering them 1,2,3... etc, in the order of
priority as you see it?

Also it seems like some of these could be applied to the Pivot function as
well, e.g., UDF for column naming.

Frank



On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey  wrote:

> Hey Frank,
>
> How are special character values handled today? It is often not ideal to
> end up with column names that require double quotes to call due to
> downstream scripts.
>
> A couple of features that would be useful
>
> * Option to define resulting column names. Please see pdltools
> implementation - the ability to pass in a function is especially useful (
> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> * Option to dummy code only the top n most frequently occurring values in
> any column
> * Option to exclude original column from results table
> * Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2
> ...) instead of values in column names + secondary mapping table
>
> Thank you
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal
> (650) 315-8905
> https://pivotal.io/
>
> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan 
> wrote:
>
>> For the module encoding categorical variables
>> http://madlib.incubator.apache.org/docs/latest/group__grp__
>> data__prep.html
>> does anyone have any suggestions on improvements that we could make?
>>
>> Here is a video on how encoding categorical variables works for those not
>> familiar with it
>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6
>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
>>
>
>


New features in MADlib

2016-10-19 Thread Frank McQuillan
Which features would you like to see in a future version of Apache MADlib?
Could be big or small stuff.

Please let the community know what you think would be valuable to work on.

(If you prefer to complete a short survey form about Apache MADlib, please
let me know & I will send a Survey Monkey link.)

I will collect input from all sources and post survey results (aggregate
and anonymous) to the Apache MADlib website


Thanks,
Frank


Encoding categorical variables

2016-10-14 Thread Frank McQuillan
For the module encoding categorical variables
http://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html
does anyone have any suggestions on improvements that we could make?

Here is a video on how encoding categorical variables works for those not
familiar with it
https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL62pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ


Draft MADlib podling report for Oct 2016

2016-10-03 Thread Frank McQuillan
Here is the draft report for Oct 2016, covering Q3 activity.

It is posted at http://wiki.apache.org/incubator/October2016

Please let me know if you have any comments or suggestions before
submission deadline on Wed Oct 5 and I will update the report.

Thanks,
Frank

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Need guidance from Incubator PMC on how to resolve the BSD licensing
switch over to Apache License.  What should be the content of the license
headers for files that were previously BSD licensed and then granted to
ASF?  Related legal-discuss threads:
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/%3ccalgg8z03zhhbfegxoi4fh+vxtf+9m7x6hak9rjkqjapuzi6...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201603.mbox/%3C9D1AF43C-370B-4E58-B0EF-2E29D242F50B%40jaguNET.com%3E
  2. Continue to produce regular Apache (incubating) releases.
  3. Continue to execute and manage the project according to governance
model required by the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 Yes-please see #1 above and provide guidance.

How has the community developed since the last report?

  1. Two new committers added to the project:
* Orhan Kislal (9/7/16)
* Nandish Jayaram (9/7/16)
  2. MADlib related events in Q3 2016:
* Jul 27 - MADLib community call.  Topic:  Open discussion on Apache MADlib
project (hosted by Greg Chase, Frank McQuillan)
* Aug 19 - Presentation to Hortonworks.  Topic:  Apache MADlib, Apache HAWQ
(incubating) and Apache Zeppelin (Rahul Iyer, Frank McQuillan)
* Sep 13 - MADLib community call.  Topic:  Deep dive on MADlib 1.9.1
release (hosted by Greg Chase, presentation by Frank McQuillan)
* Sep 21 - Meetup at Hortonworks San Francisco.  Topic:  Future of data -
Apache MADlib and Apache HAWQ (Tushar Pednekar)
* Sep 22 - Meetup at Hortonworks Santa Clara.  Topic:  Future of data -
Apache MADlib and Apache HAWQ (Tushar Pednekar)
  3. Material technical conversations on dev mailing lists and in the
appropriate JIRAs and pull requests.

How has the project developed since the last report?

  1. 3rd ASF release MADlib v1.9.1 released on Sep 19, 2016.  Features
include:  path functions (phase 2), 1-class support vector machines for
novelty detection, prediction metrics, sessionization, pivoting.
  2. Community has started active development on the v1.10 release.
  3. 13 JIRAs created and 5 resolved in last 30 days.

Date of last release:

  MADlib v1.9.1 on 9/19/16.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


New blog published on last release

2016-09-27 Thread Frank McQuillan
Hi,

I just published a new blog on Pivotal.io called "New Tools To Shape Data
In Apache MADlib"
https://blog.pivotal.io/big-data-pivotal/products/new-tools-to-shape-data-in-apache-madlib
based on the last release.

Please have a look and let me know if you have any comments.

Frank


PFA and PMML

2016-09-22 Thread Frank McQuillan
Does anyone on the list have an opinion on

PFA
http://dmg.org/pfa/
vs
PMML
http://dmg.org/pmml/v4-3/GeneralStructure.html
?

MADlib supports PMML export for a number of algorithms
https://cwiki.apache.org/confluence/display/MADLIB/FAQ#FAQ-Q2-3CanIexportmodelsfromMADlibtoPMML
?

but the question is whether to port these to PFA and support PFA going
forward, and not support PMML?

Frank


Next MADlib version number suggestion

2016-09-21 Thread Frank McQuillan
Hello,

I would like to suggest that the next release of MADlib be called v1.10.
Recently I have been referring to it as v1.9.2.

However, MADlib follows 3-digit semantic versioning MAJOR.MINOR.PATCH
http://semver.org/
where:

* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a backwards-compatible
manner, and
* PATCH version when you make backwards-compatible bug fixes.

Since the next release will add functionality, it should be MINOR.

I actually made a mistake in naming v1.9.1 since it added functionality but
was versioned as if it was a PATCH, which it wasn’t.

I'll update the JIRAs and wiki unless anyone has an objection.

Frank


Apache MADlib (incubating) v1.9.1 Release Announcement - GA

2016-09-20 Thread Frank McQuillan
This is the 3rd Apache release for MADlib.

Features of this release:
* new modules (1-class SVM for novelty detection, prediction metrics,
sessionization, pivoting)
* improvements to existing modules (class weights in SVM, overlapping
patterns in path)
* performance improvements (path)
* platform updates (support for PostgreSQL 9.5 and 9.6)
* bug fixes
* doc improvements

For more information please read the release notes:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

Download the release:
http://madlib.incubator.apache.org/download.html

Thank you to the MADlib community for a very fine release.

Here’s a look at some future features being considered:
https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
Happy to get your input on what you would like to see.  New contributors
are always welcome.

Frank


Re: Spatial model in MADlib (GWR)

2016-09-20 Thread Frank McQuillan
I created a JIRA for GWR
https://issues.apache.org/jira/browse/MADLIB-1023
and marked it for v1.9.2 for now though we can of course change that
depending on progress.





On Wed, Sep 14, 2016 at 9:36 AM, Rahul Iyer  wrote:

> Yes, you'll have to compile MADlib with DEBUG flag. Use `cmake
> -DCMAKE_BUILD_TYPE=Debug ..` to get MADlib symbols.
> You would benefit by compiling Postgres (with debug) as well but may not be
> necessary for what you're doing.
>
> On Tue, Sep 13, 2016 at 10:20 PM, Wang ChenLiang 
> wrote:
>
> > Hi Rahul
> >
> > Thanks for your reply. Should I build MADlib and Postgres with
> > enable-debug and no optimization flags? How can I list source code of
> > MADlib in GDB ? Sorry for asking such silly questions.
> >
> > Best,
> > Chenliang Wang
> >
> > On 09/14/2016 02:18 AM, Rahul Iyer wrote:
> > > Hi Chengliang
> > >
> > > There's some information on debugging
> > > ​in ​
> > > our old wiki page
> > > <https://github.com/madlib/madlib/wiki/Building-MADlib-
> > from-Source#debugging>
> > > ​. There's no example there but the process is simple once you have the
> > > server process id.
> > >
> > > - Rahul ​
> > >
> > > On Tue, Sep 13, 2016 at 6:41 AM, Wang ChenLiang 
> > wrote:
> > >
> > >> Hi Frank,
> > >>
> > >> I was being on a business trip for several months and began to work on
> > >> MADlib again in the past few days. But I have a trouble with debugging
> > >> MADlib with GDB. Could you kindly give me a detailed example for
> > >> debugging MADlib with CodeBlocks or GDB?
> > >>
> > >> Many Thanks !
> > >>
> > >>
> > >> On 03/15/2016 12:31 AM, Frank McQuillan wrote:
> > >>> OK.  Please don't hesitate to ask if you have any questions.
> > >>>
> > >>> Frank
> > >>>
> > >>> On Mon, Mar 14, 2016 at 4:17 AM, chenliang wang  >
> > >> wrote:
> > >>>
> > >>>> Hi, Frank
> > >>>>
> > >>>> Recently,I am just looking at the detail of development guide and
> > trying
> > >>>> to complete the serial algorithm. And I plan to implement GWR
> dividing
> > >>>> the loop into pieces of chunks executed in several nodes. However, I
> > am
> > >>>> not sure if there are some specials details need to be designed for
> > >>>> distributed models in GPDB because I haven't developed model in MPP
> > >>>> architecture. I hope this distributed manner would be implemented
> > >> easily.
> > >>>>
> > >>>> Best,
> > >>>> Chenliang Wang
> > >>>>
> > >>>> On 03/10/2016 08:33 AM, Frank McQuillan wrote:
> > >>>>> Hi ChenLiang Wang,
> > >>>>>
> > >>>>> I am checking to see how things are going regarding the GWR model
> for
> > >>>>> MADlib that you proposed.  Not sure which phase you are at, but a
> > >>>> suggested
> > >>>>> next step might be how you plan to implement the GWR algorithm in a
> > >>>>> distributed manner.  That is, how will it run in parallel?
> > >>>>>
> > >>>>> (Starting as a new thread since the previous thread fragmented.)
> > >>>>>
> > >>>>> Regards,
> > >>>>> Frank
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>


Re: kNN implementation

2016-09-19 Thread Frank McQuillan
Hi Babak,

I noticed the KNN poster
http://dsr.cise.ufl.edu/wp-content/uploads/2016/05/MADlib_Combined.pptx.pdf
and was wondering if you have plans to make a pull request?

Frank

On Fri, Apr 8, 2016 at 11:35 AM, Xiaocheng Tang  wrote:

> Thanks Babak. We will review the interface and get back to you soon.
>
> Meanwhile, you may go ahead with the implementations, for which I would
> suggest
> that you describe your approach in a design doc and share it here first
> before actually
> coding it up. The design doc should include necessary math formulations and
> relevant references if there is any.
>
> > On Apr 8, 2016, at 8:19 AM, Babak Alipour 
> wrote:
> >
> > Thank you so much Xiaocheng for your reply.
> >
> > I was actually using the k-means interface and modifying it to fit kNN
> > needs.
> >
> > The function definitions as I have been thinking about are:
> >
> > FUNCTION MADLIB_SCHEMA.knn(
> >
> >rel_train VARCHAR,
> >
> >rel_test VARCHAR
> >
> >ignore_col_train VARCHAR,
> >
> >fn_dist VARCHAR,
> >
> >fn_dist_weighting VARCHAR,
> >
> >k INTEGER
> >
> > ) Returns knn_result
> >
> > @param rel_train   Name of the relation containing the training input
> points
> >
> > @param rel_testName of the relation containing the testing input
> points
> > {these are the points whose k-nearest neighbors are to be found in
> > rel_train }
> >
> > @param ignore_col_train  Name of column to be ignored(e.g. the Class
> column
> > containing the classes which can be used for supervised learning
> elsewhere)
> >
> > @param fn_dist Name of a function with signature DOUBLE PRECISION[] x
> > DOUBLE PRECISION[] -> DOUBLE PRECISION that returns the distance
> > between two points. The default is the \ref
> > squared_dist_norm2(float8[],float8[]) "squared Euclidean distance". {
> based
> > on k-means example, basically take two points, output their distance  }
> > I'd rather use  minkowski  distance and add another parameter, P, with a
> > default of 2, so that the function is more generalizable (P=1, Manhattan,
> > P=2, Euclidean, etc.)
> >
> > @param fn_dist_weighting  Name of a function with signature DOUBLE
> > PRECISION -> DOUBLE PRECISION that, for each  distance, returns a
> > weighted distance value. (e.g. None, 1-d and 1/d weighting schemes)
> >
> > @param k number of nearest neighbors
> >
> >
> > @returns a composite value
> >
> > Type knn_result
> >indexes INTEGER[][],
> >distances DOUBLE PRECISION[][]
> > - indexes -  matrix[n][k] of k-nearest neighbors' indexes (n is
> > size of rel_test, i.e.  |rel_test|  , k is an input parameter which
> > specifies the number of nearest neighbors), which specifies data point
> > indexed i,  whose k-nearest neighbors are in columns 0 to k-1.
> > Each column is the index of - distances - array of k-nearest
> > neighbors' distances
> > In other words, the row numbers of k-nearest neighbors in rel_train, for
> > the i*th*  data point of rel_test,  are in matrix[i][0:k-1], the
> distances
> > to those data points are in distances[i][0:k-1]
> > For implementation, I'm not quite sure how to store the results so that I
> > can efficiently update the k-nearest points as a pass over the data is
> > being done. Since kNN is non-iterative, I think it is possible to
> implement
> > kNN completely in SQL without calling external driver functions. It might
> > also be a good idea to store the results in another table instead of
> > returning a type like this or put the results in a temp table and then
> > return the result, feedback is always welcome.
> > I'm not specifically using a majority voting, because I do not want to
> > limit this implementation to kNN classification, but rather a general kNN
> > algorithm, then we could come up with another method that uses this kNN
> > implementation, then runs a majority vote for classification or average
> for
> > regression.
> >
> >
> > FUNCTION MADLIB_SCHEMA.__init_validate_args  {Similar to the k-means
> > function __seeding_validate_args, k should be positive, k should be less
> > than the number of points, fn_dist should have the correct signature,
> > fn_dist_weighting should have the correct signature)
> >
> >
> >
> > I look forward to the community's feedback as I work on this module.
> >
> >
> > Best regards,
> > Babak
> >
> >
> > On Mon, Apr 4, 2016 at 1:28 AM, Xiaocheng Tang  > wrote:
> >
> >> Hi Babak,
> >>
> >> Thank you for your interest in k-NN!
> >> https://issues.apache.org/jira/browse/MADLIB-927 <
> https://issues.apache.org/jira/browse/MADLIB-927> <
> >> https://issues.apache.org/jira/browse/MADLIB-927 <
> https://issues.apache.org/jira/browse/MADLIB-927>>
> >>
> >> The interface of new modules should be consistent with
> >> the existing ones in MADlib. In this case I would suggest
> >> studying the K-means first
> >> https://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> 

Re: Contributing GMM and Perceptron to MADLib

2016-09-19 Thread Frank McQuillan
Hi Aditya,

I noticed the KNN poster
http://dsr.cise.ufl.edu/wp-content/uploads/2016/05/MADlib_Combined.pptx.pdf
and was wondering if you have plans to make a pull request?

Frank


On Mon, Mar 28, 2016 at 9:37 PM, Roman Shaposhnik  wrote:

> Awesome!
>
> On Mon, Mar 28, 2016 at 9:18 PM, Frank McQuillan 
> wrote:
> > Thanks Roman.  I was able to do it just now.
> >
> > Frank
> >
> > On Mon, Mar 28, 2016 at 9:12 PM, Roman Shaposhnik 
> wrote:
> >>
> >> I can help with that -- stay tuned.
> >>
> >> On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan  >
> >> wrote:
> >> > Let me figure out how to do this and add Aditya as the owner of that
> >> > JIRA.
> >> > My initial attempts in ASF infra-land were not quite successful.
> >> >
> >> > Frank
> >> >
> >> > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer  wrote:
> >> >>
> >> >> @Frank, Roman: I believe Aditya needs to be added as a developer to
> the
> >> >> MADlib project to assign a JIRA to him? Is this only available to the
> >> >> lead/owner?
> >> >>
> >> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain 
> >> >> wrote:
> >> >>>
> >> >>> Hi Rahul,
> >> >>>
> >> >>> I didn't have an id, so I created one now.
> >> >>> My id is : Aditya Nain
> >> >>>
> >> >>> Thanks,
> >> >>> Aditya
> >> >>>
> >> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer 
> wrote:
> >> >>>
> >> >>> > I can assign this to you, but you need to have an account in
> >> >>> > https://issues.apache.org.
> >> >>> > If you already have an account, then please send your id - I
> wasn't
> >> >>> > able to
> >> >>> > find you just using your name.
> >> >>> >
> >> >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <
> adityana...@gmail.com>
> >> >>> > wrote:
> >> >>> >
> >> >>> > > Hi Rahul,
> >> >>> > >
> >> >>> > > Thanks for the reply!
> >> >>> > >
> >> >>> > > I am working on implementing Gaussian Mixture Model assuming
> that
> >> >>> > > the
> >> >>> > > co-variance matrix is same for all the Gaussians.
> >> >>> > > The JIRA which deals GMM is MADBLIB-410:
> >> >>> > >
> >> >>> >
> >> >>> >
> >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=
> project%20%3D%20MADLIB
> >> >>> > >
> >> >>> > > Can this be assigned to me, or how do I get it assigned to me?
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > > Aditya
> >> >>> > >
> >> >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer 
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > > > Hi Aditya,
> >> >>> > > >
> >> >>> > > > Welcome to the MADlib community!
> >> >>> > > >
> >> >>> > > > Gaussian Mixture models is extrememly useful and we would
> >> >>> > > > heartily
> >> >>> > > welcome
> >> >>> > > > a contribution for it. The SQLEM paper might be
> oversimplifying
> >> >>> > > > the
> >> >>> > > > capabilities of the database (e.g. assuming there is no array
> >> >>> > > > type
> >> >>> > > > is
> >> >>> > > > unnecessary for Postgresql). You could speed things (both dev
> >> >>> > > > time
> >> >>> > > > and
> >> >>> > > > execution time) by writing some of the functions in C++.
> K-means
> >> >>> > > > is
> >> >>> > > > an
> >> >>> > > > example of how clustering is implemented.
> >> >>> > > > IMO, assuming the same covariance matrix is reasonable. We
> could
> >> >>> > > > extend
> >> >>> 

New Apache MADlib contributor: Nandish Jayaram

2016-09-15 Thread Frank McQuillan
Dear MADlib dev community,

The Project Management Committee (PMC) for Apache MADlib has asked
Nandish Jayaram to become a committer and we are pleased to announce that he
has accepted.

Here are some of his contributions:

- New features (Sessionization, initial stages of one-class SVM)
- Expansion of existing modules (Path)
- Bug fixes (path, elastic net, decision tree)
- Infrastructure projects

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process.  This should enable better
productivity.  Being a PMC member enables assistance with the management
and to guide the direction of the project.

Welcome Nandish!

Regards,
Frank


New Apache MADlib contributor: Orhan Kislal

2016-09-15 Thread Frank McQuillan
Dear MADlib dev community,

The Project Management Committee (PMC) for Apache MADlib has asked
Orhan Kislal to become a committer and we are pleased to announce that he
has accepted.

Here are some of Orhan’s recent contributions:

- Release manager for 1.9alpha, 1.9 and 1.9.1
- Worked on:
  - New features (prediction metrics, pivoting)
  - Expansion of existing modules (PCA)
  - Upgrade support
  - Release related tasks
  - Bug fixes (kmeans, random forest, elastic net etc.)

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process.  This should enable better
productivity.  Being a PMC member enables assistance with the management
and to guide the direction of the project.

Welcome Orhan!

Regards,
Frank


1.9.1 release status

2016-09-14 Thread Frank McQuillan
Hello,

Quick update on 1.9.1 status.  We are still working through some licensing
header questions on the release candidate with the Apache IPMC which has
delayed posting the GA release.  The software has been done for quite some
time, just need to clear these last licensing hurdles.

Thanks for your patience.

Frank


Re: [VIRTUAL] MADlib Community Meeting TODAY, September 13, 9AM Pacific: Deep Dive into MADlib 1.9.1

2016-09-13 Thread Frank McQuillan
Thanks for attending

Links mentioned in today’s MADlib community meeting

Release notes 1.9.1
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

Jupyter notebooks for 1.9.1 demos
https://github.com/madlib/madlib-examples

Previous community call describing path functions in more detail (just
skimmed over this today)
https://www.youtube.com/watch?v=vFJSeSvQT94&index=4&list=PL62pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ

Frank

On Tue, Sep 13, 2016 at 8:07 AM, Gregory Chase  wrote:

> Reminder the MADlib meeting starts in about an hour, and we have a new
> meeting platform, with either 100% streaming or dial-in option
>
> Join from PC, Mac, Linux, iOS or Android:
> https://pivotal.zoom.us/j/923158161
>
> To join phone conference (or you can stream from above):
>
> Or iPhone one-tap (US Toll):  +16465588656,923158161# or
> +14086380968,923158161#
>
> Or Telephone:
> Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
> Meeting ID: 923 158 161
> International numbers available (scroll to bottom):
> https://pivotal.zoom.us/zoomconference?m=w6tQZeVQnm1XJz5xZ8gX94j8OEH_ENCn
>
> I know we have at least one person from China possibly joining today. I'm
> sorry we don't have a Chinese dial-in number. I recommend using either
> closest or US toll.
>
>
> On Mon, Sep 12, 2016 at 4:50 PM, Gregory Chase  wrote:
>
> > Dear MADlib, HAWQ, and Greenplum communities,
> > This is a reminder that tomorrow at 9AM, we'll be talking about the new
> > release of MADlib 1.9.1.
> >
> > Also, we're changing our meeting platform to Zoom.  Directions on how to
> > join are below.  Please be aware that you'll need to download the Zoom
> > client if its your first time meeting with Zoom.
> >
> > We also have a dial-in for the first time ever, or you can also join and
> > hear streaming audio via your computer:
> >
> > Join from PC, Mac, Linux, iOS or Android: https://pivotal.zoom.us/j/
> > 923158161
> >
> > Or iPhone one-tap (US Toll):  +16465588656,923158161# or +14086380968
> > ,923158161#
> >
> > Or Telephone:
> > Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
> > Meeting ID: 923 158 161
> > International numbers available: https://pivotal.zoom.us/
> > zoomconference?m=w6tQZeVQnm1XJz5xZ8gX94j8OEH_ENCn
> >
> > On Tue, Sep 6, 2016 at 3:12 PM, Gregory Chase  wrote:
> >
> >> Dear MADlib, HAWQ, and Greenplum Communities,
> >> The Apache MADlib (incubating) project is about to release MADlib 1.9.1.
> >>
> >> To celebrate, we are organizing the next MADlib Virtual Community
> Meeting
> >> next Tuesday, September 13 at 9AM Pacific.
> >> Join here  | Add to
> >> your calendar
> >>  TEMPLATE&hl=en&text=MADlib%3A%20Deep%20Dive%20into%201.9.1&
> dates=20160913T09%2F20160913T095000&location=https%3A%2F%
> 2Fpivotalcommunity.adobeconnect.com%2Fmadlib%2F&ctz=America%2FLos_Angeles&
> details=The%20Apache%20MADlib%20%28incubating%29%20project%
> 20is%20about%20to%20release%20MADlib%201.9.1.%20%0A%0ATo%
> 20celebrate%2C%20we%20are%20organizing%20the%20next%20MADlib%20Virtual%
> 20Community%20Meeting%20next%20Tuesday%2C%20September%2013%
> 20at%209AM%20Pacific.%20%20%0A%0A%0AWe%27ll%20be%20taking%
> 20a%20deep%20dive%20into%20the%20new%20capabilities%
> 20of%201.9.1%20including%3A%0A%0ANew%20functions%20for%3A%
> 0AOne%20class%20SVM%0APrediction%20Metrics%0ASessionization%20%0APivot%
> 0A%0AWe%27ve%20also%20enhanced%20existing%20SVM%
> 20use%20cases%20to%20assign%20weights%20to%20multiple%
> 20classes%2C%20and%20greatly%20improved%20the%20path%
> 20function.%20%20Finally%20we%27ve%20updated%20support%
> 20for%20PostgreSQL%209.5%20and%209.6%0A%0AAfter%20discussing%20the%20new%
> 20capabilities%2C%20we%27ll%20demo%20novelty%20detection%
> 2C%20path%20functions%2C%20prediction%20metrics%2C%
> 20and%20sessionization.%0A%0ARelease%20notes%20can%20be%
> 20found%20here%3A%20https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
> 2FMADLIB%2FMADlib%2B1.9.1%0A%0AExamples%20of%20new%
> 20capabilities%20can%20be%20found%20here%3A%20https%3A%
> 2F%2Fgithub.com%2Fmadlib%2Fmadlib-examples%0A>
> >>
> >> We'll be taking a deep dive into the new capabilities of 1.9.1
> including:
> >>
> >> New functions for:
> >> One class SVM
> >> Prediction Metrics
> >> Sessionization
> >> Pivot
> >>
> >> We've also enhanced existing SVM use cases to assign weights to multiple
> >> classes, and greatly improved the path function.  Finally we've updated
> >> support for PostgreSQL 9.5 and 9.6
> >>
> >> After discussing the new capabilities, we'll demo novelty detection,
> path
> >> functions, prediction metrics, and sessionization.
> >>
> >> Release notes can be found here: https://cwiki.apache.org/confl
> >> uence/display/MADLIB/MADlib+1.9.1
> >>
> >> Examples of new capabilities can be found here:
> >> https://github.com/madlib/madlib-examples
> >>
> >> See you next Tuesday!
> >>
> >> Join here 

Re: Improving video representation on MADlib pages

2016-09-06 Thread Frank McQuillan
I posted the images here
https://cwiki.apache.org/confluence/display/MADLIB/Misc
for people to have a look.

I think the layout looks pretty good the way you are proposing it.

Frank


On Tue, Sep 6, 2016 at 6:22 PM, Roman Shaposhnik 
wrote:

> Enjoy your admin karma, but remember -- with great power comes git blame
> ;-)
>
> Thanks,
> Roman.
>
> On Tue, Sep 6, 2016 at 6:20 PM, Greg Chase  wrote:
> > gregchase
> >
> > On Tue, Sep 6, 2016 at 6:18 PM, Roman Shaposhnik 
> > wrote:
> >
> >> Greg, what's your wiki ID? I can give you karma.
> >>
> >> Thanks,
> >> Roman.
> >>
> >> On Tue, Sep 6, 2016 at 6:15 PM, Greg Chase  wrote:
> >> > Oops
> >> >
> >> > Attachments got cut off, and I don't have access to the wiki to post
> them
> >> > there.
> >> >
> >> > @Frank -> I will forward you the mockups.  Can you post them on the
> wiki?
> >> >
> >> > Thanks,
> >> >
> >> > -Greg
> >> >
> >> > On Tue, Sep 6, 2016 at 6:12 PM, Gregory Chase 
> wrote:
> >> >
> >> >> Dear MADlib developers,
> >> >> Now that we have a couple of nice new videos about the origin of
> MADlib
> >> >> and the MADlib community, lets make the pages that host them look
> >> better.
> >> >>
> >> >> Attached are a couple of mockups that remove the big video bars and
> put
> >> >> the video closer to main text.
> >> >>
> >> >> For the product page: http://madlib.incubator.
> apache.org/product.html
> >> >>
> >> >> and the community page: http://madlib.incubator.
> >> apache.org/community.html
> >> >>
> >> >> These are fairly simple design changes, so I'll invoke lazy consensus
> >> >>
> >> >> If there are no specific critiques, I will have this posted in a
> couple
> >> of
> >> >> days.
> >> >>
> >> >> --
> >> >> Greg Chase
> >> >>
> >> >> Global Head, Big Data Communities
> >> >> http://www.pivotal.io/big-data
> >> >>
> >> >> Pivotal Software
> >> >> http://www.pivotal.io/
> >> >>
> >> >> 650-215-0477
> >> >> @GregChase
> >> >> Blog: http://geekmarketing.biz/
> >> >>
> >> >>
> >>
>


Subject: [RESULT] [VOTE] MADlib v1.9.1-rc2

2016-09-06 Thread Frank McQuillan
Hello,

Thank you to all community members who voted.

Below is the tally of the votes:

+1 (binding):

none


+1 (non binding):

Xixuan (Aaron) Feng
Greg Chase
Rahul Iyer
Xiaocheng Tang
Orhan Kislal
Woo Jae Jung
Srivatsan Ramanujam
Nandish Jayaram
Satoshi Nagayasu
Frank McQuillan


0, -1 or other votes:

none


I will post an email vote request to gene...@incubator.apache.org and
indicate to the ASF incubator principles that the MADlib community has
endorsed the release of the v1.9.1-rc2 artifacts.

Regards,
Frank


Re: [VIDEO] Origin of Apache MADlib (incubating)

2016-09-06 Thread Frank McQuillan
Thanks for posting, Greg.

On Tue, Sep 6, 2016 at 3:49 PM, Gregory Chase  wrote:

> Dear MADlib Community,
> The very first committer to MADlib, Joe Hellerstein, was kind enough to
> tell us the story about the origin of Apache MADlib in this video:
>
> https://www.youtube.com/watch?v=DGPZwpB92Aw&index=10&list=PL62pIycqXx-
> Qf6EXu5FDxUgXW23BHOtcQ
>
> Please enjoy.
>
> -Greg
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>


Re: [VOTE] MADlib v1.9.1-rc2

2016-09-06 Thread Frank McQuillan
Gentle reminder to vote on this release.  Voting ends at 6 pm Pacific time
today.

Thanks,
Frank

On Fri, Sep 2, 2016 at 10:26 AM, Frank McQuillan 
wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.9.1 RC-2, with the artifacts below up for a
> vote.
>
> This release candidate replaces RC-1.  The only difference between RC-1
> and RC-2 is
> that some ._’ files were sneaked in by OSX during the packaging.
> These have been removed.
>
> This will be the 3rd release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new modules (1-class SVM for novelty detection, prediction metrics,
> sessionization, pivoting)
> * improvements to existing modules (class weights in SVM, overlapping
> patterns in path)
> * performance improvements (path)
> * platform updates (PostgreSQL 9.5 and 9.6)
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1
>
> *** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST ***
>
> We're voting upon the source (tag):  rc/1.9.1-rc2
>
> Source Files:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.
> 1-incubating-rc2
>
> Commit to be voted upon:
> https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=
> e1c99c1538dc124c9b323ba76382ba2af05c6892
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, can PMC members please be sure to indicate
> "(binding)" with their vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Thank you,
> Frank McQuillan
>
>


[VOTE] MADlib v1.9.1-rc2

2016-09-02 Thread Frank McQuillan
Hello MADlib community,

We have created a MADlib 1.9.1 RC-2, with the artifacts below up for a vote.

This release candidate replaces RC-1.  The only difference between RC-1 and
RC-2 is
that some ._’ files were sneaked in by OSX during the packaging.
These have been removed.

This will be the 3rd release for Apache MADlib (incubating).

The main goals of this release are:
* new modules (1-class SVM for novelty detection, prediction metrics,
sessionization, pivoting)
* improvements to existing modules (class weights in SVM, overlapping
patterns in path)
* performance improvements (path)
* platform updates (PostgreSQL 9.5 and 9.6)
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

*** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST ***

We're voting upon the source (tag):  rc/1.9.1-rc2

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.1-incubating-rc2

Commit to be voted upon:
https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=e1c99c1538dc124c9b323ba76382ba2af05c6892

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, can PMC members please be sure to indicate
"(binding)" with their vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Thank you,
Frank McQuillan


Re: [VOTE] MADlib v1.9.1-rc1

2016-09-02 Thread Frank McQuillan
Thanks.  I will re-send the [VOTE] request.

On Fri, Sep 2, 2016 at 9:25 AM, Rahul Iyer  wrote:

> New RC uploaded with source files at
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.
> 1-incubating-rc2/
>
> Everything else remains the same.
>
> On Fri, Sep 2, 2016 at 8:59 AM, Frank McQuillan 
> wrote:
>
> > I think that is the safest approach, to create a new RC.
> >
> > Let us cancel the vote on RC-1 and when RC-2 is posted, I will call for a
> > new vote
> >
> > Thank you Satoshi for catching this.
> >
> > Frank
>


Re: [VOTE] MADlib v1.9.1-rc1

2016-09-02 Thread Frank McQuillan
gt; > > error: expected unqualified-id before numeric constant
> > > > make[2]: *** [src/ports/postgres/9.6/CMakeFiles/madlib_postgresql_
> > > 9_6.dir/__/__/__/modules/tsa/._arima.cpp.o]
> > > > Error 1
> > > > make[1]: *** [src/ports/postgres/9.6/CMakeFiles/madlib_postgresql_
> > > 9_6.dir/all]
> > > > Error 2
> > > > make: *** [all] Error 2
> > > > [snaga@localhost build]$
> > > >
> > > > And I found that the tarball contains some binary files (?) which
> > > > seems built on Mac OS X.
> > > > I guess this is the reason of the build failure.
> > > >
> > > > [snaga@localhost madlib]$ tar ztvf
> > > > apache-madlib-1.9.1-incubating-source.tar.gz | grep _arima
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima.py_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima.sql_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima_forecast.py_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/test/._arima.sql_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/test/._arima_train.sql_in
> > > > -rw-r--r-- riyer/staff     226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.hpp
> > > > [snaga@localhost madlib]$ file
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp:
> > > > AppleDouble encoded Macintosh file
> > > > [snaga@localhost madlib]$
> > > >
> > > > Is this intended? Or should it be fixed?
> > > >
> > > > Regards,
> > > >
> > > >
> > > > 2016-09-02 4:17 GMT+09:00 Frank McQuillan :
> > > >> Hello MADlib community,
> > > >>
> > > >> We have created a MADlib 1.9.1 release candidate, with the artifacts
> > > below
> > > >> up for a vote.
> > > >>
> > > >> This will be the 3rd release for Apache MADlib (incubating).
> > > >>
> > > >> The main goals of this release are:
> > > >> * new modules (1-class SVM for novelty detection, prediction
> metrics,
> > > >> sessionization, pivoting)
> > > >> * improvements to existing modules (class weights in SVM,
> overlapping
> > > >> patterns in path)
> > > >> * performance improvements (path)
> > > >> * platform updates (PostgreSQL 9.5 and 9.6)
> > > >> * bug fixes
> > > >> * doc improvements
> > > >>
> > > >> For more information including release notes, please see:
> > > >> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1
> > > >>
> > > >> *** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST
> ***
> > > >>
> > > >> We're voting upon the source (tag):  rc/1.9.1-rc1
> > > >>
> > > >> Source Files:
> > > >> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.
> > > 1-incubating-rc1/
> > > >>
> > > >> Commit to be voted upon:
> > > >> https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.
> > > git;a=commit;h=e1c99c1538dc124c9b323ba76382ba2af05c6892
> > > >>
> > > >> KEYS file containing PGP Keys we use to sign the release:
> > > >> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> > > >>
> > > >> To help in tallying the vote, can PMC members please be sure to
> > indicate
> > > >> "(binding)" with their vote.
> > > >>
> > > >> [ ] +1  approve
> > > >> [ ] +0  no opinion
> > > >> [ ] -1  disapprove (and reason why)
> > > >>
> > > >> Thank you,
> > > >> Frank McQuillan
> > > >
> > > >
> > > >
> > > > --
> > > > Satoshi Nagayasu 
> > >
> > >
> > >
> > > --
> > > Satoshi Nagayasu 
> > >
> >
> >
> >
> > --
> >
> > -
> > Rahul Iyer
> > Principal software engineer | Predictive Analytics
> >
> > *Pivotal**A new platform for a new era*
> >
>


New interface to ASF mailing lists

2016-09-01 Thread Frank McQuillan
Roman let me know about this new interface to ASF mailing lists.

It is still in beta, but it seems to be working quite nicely:
https://lists.apache.org/
or
https://lists.apache.org/list.html?d...@madlib.apache.org

In case this is more user friendly for you.

Frank


Jupyter notebooks for v1.9.1 demos

2016-09-01 Thread Frank McQuillan
I posted some Jupyter notebooks with small data sets at
https://github.com/madlib/madlib-examples
to try out v1.9.1 features.

Many of these examples are used in the user docs for v1.9.1.

As you can see in the other thread from today, v1.9.1 RC is up for [VOTE]
so please vote.

Thanks,
Frank


[VOTE] MADlib v1.9.1-rc1

2016-09-01 Thread Frank McQuillan
Hello MADlib community,

We have created a MADlib 1.9.1 release candidate, with the artifacts below
up for a vote.

This will be the 3rd release for Apache MADlib (incubating).

The main goals of this release are:
* new modules (1-class SVM for novelty detection, prediction metrics,
sessionization, pivoting)
* improvements to existing modules (class weights in SVM, overlapping
patterns in path)
* performance improvements (path)
* platform updates (PostgreSQL 9.5 and 9.6)
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

*** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST ***

We're voting upon the source (tag):  rc/1.9.1-rc1

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.1-incubating-rc1/

Commit to be voted upon:
https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=e1c99c1538dc124c9b323ba76382ba2af05c6892

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, can PMC members please be sure to indicate
"(binding)" with their vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Thank you,
Frank McQuillan


Re: v1.9.1?

2016-08-30 Thread Frank McQuillan
Hi Satoshi,

It has not been released yet, we are still putting the final touches on the
release.

We are hoping to put up a release candidate in the next day or 2.

Then I will send a note out to this list for the community to review it.

Thanks,
Frank

On Mon, Aug 29, 2016 at 11:34 PM, Satoshi Nagayasu  wrote:

> Hi,
>
> Has 1.9.1 already been released?
> Or still working on the release process?
>
> I would like to know it because I'm going to share v1.9.1 with
> our local PostgreSQL people.
>
> I found the release note has been updated on the git repo,
> but the 1.9.1 info, including the tar ball, has not yet come up
> on the web.
>
> Regards,
> --
> Satoshi Nagayasu 
>


  1   2   >