Hi Frank, Thank you for your response to my feedback and posting the pdf. :-)
Looking forward to using the new release! Thanks, Rashmi On Mon, Apr 30, 2018 at 10:51 AM, Frank McQuillan <[email protected]> wrote: > Hi Rashmi, > > I attached the completed user docs for balanced data sets to the JIRA > https://issues.apache.org/jira/browse/MADLIB-1168 > for your review. > > The doc is called "MADlib_Balanced Sampling.pdf" > > Your idea of posting the updated user docs for the upcoming release is a > good one. > > Frank > > On Fri, Apr 27, 2018 at 6:56 PM, Srivatsan Ramanujam <[email protected] > > > wrote: > > > Built from source and tested on Mac. (High Sierra - 10.13.3, cmake > version > > 3.11.0-rc2, Postgres 9.6.4) > > > > +1 (binding) > > > > > > > > > > On Fri, Apr 27, 2018 at 6:09 PM, Jingyi Mei <[email protected]> wrote: > > > > > Hi Rashmi, > > > > > > Thanks for the comments and feedback! > > > > > > The release page with a page-not-found error should not be there since > we > > > haven't made the actual release yet. We just removed the link in that > > page > > > and it will be added again after the community has voted and we have an > > > official release. > > > > > > Concerning the documentation links for new features, it is definitely a > > > great idea to add them in the release notes and also vote email! Thanks > > for > > > the recommendation and we will see if we can make it better in this > > release. > > > > > > Cheers, > > > Jingyi Mei > > > > > > On Fri, Apr 27, 2018 at 3:19 PM, Rashmi Raghu <[email protected]> > wrote: > > > > > >> Installed on Postgres 9.6 on MacOS using dmg. > > >> Checked out the new additions to the summary function. Looks good. My > > >> vote: +1 (binding). > > >> > > >> Some comments aside from the vote: > > >> > > >> - I followed this link in the email: https://cwiki.apache.or > > >> g/confluence/display/MADLIB/MADlib+1.14 > > >> <https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14> > and > > >> then from there clicked on https://dist.apache.org/rep > > >> os/dist/release/madlib/1.14/ which gives a page-not-found error. > > >> - I didn't see a link to documentation associated with this > release - > > >> it would be useful to have that also available (let me know if it > > was in > > >> the email and I missed it or if it is not standard practice). For > > instance, > > >> I wanted to briefly look at the new balanced datasets module and it > > would > > >> have been easy to look it up in the web version of the docs. I did > > find the > > >> docs through the function call e.g. madlib.balance_sample('usage') > > but that > > >> requires knowing roughly what function name to look for (not hard > in > > this > > >> case but I can imagine other situations where it might not be > > >> straightforward) > > >> > > >> Great to see all the new features and bug fixes! > > >> > > >> Thanks, > > >> Rashmi > > >> > > >> > > >> On Fri, Apr 27, 2018 at 1:40 PM, Orhan Kislal <[email protected]> > > wrote: > > >> > > >>> Tested on PG 10.3 (src and dmg). Looks good. +1 (binding) > > >>> > > >>> Thanks for preparing the release Jingyi, > > >>> > > >>> Orhan Kislal > > >>> > > >>> On Fri, Apr 27, 2018 at 11:44 AM, Frank McQuillan < > > [email protected] > > >>> > wrote: > > >>> > > >>>> Hi Jingyi, > > >>>> > > >>>> Thanks for posting the artifacts and sending out the vote. > > >>>> > > >>>> My findings: > > >>>> > > >>>> Installation and IC passed on postgres 9.6.7 > > >>>> > > >>>> Also I tested a cpl of the new features (personalized page rank and > > >>>> mini-batch preprocessor) > > >>>> and they worked OK for me with a small sample data set. > > >>>> > > >>>> +1 (binding) > > >>>> > > >>>> On Thu, Apr 26, 2018 at 2:57 PM, Jingyi Mei <[email protected]> > wrote: > > >>>> > > >>>> > Hello Apache MADlib dev community, > > >>>> > > > >>>> > This is the vote for Apache MADlib 1.14 Release (RC1). It provides > > the > > >>>> > source release tarball and convenience binaries. This is the third > > >>>> > Apache MADlib release as an Apache Top Level Project (TLP). > > >>>> > > > >>>> > The vote will run for at least 72 working hours and will close on > > >>>> > Tuesday, May 1st, 2018 @ 6pm PDT. A minimum of 3 binding +1 votes > > and > > >>>> > more binding +1 than binding -1 are required to pass. > > >>>> > > > >>>> > The main goals of this release are: > > >>>> > > > >>>> > New features: > > >>>> > > > >>>> > - New module - Balanced datasets: A sampling module to balance > > >>>> > classification > > >>>> > datasets by resampling using various techniques including > > >>>> > undersampling, > > >>>> > oversampling, uniform sampling or user-defined proportion > > sampling > > >>>> > (MADLIB-1168) > > >>>> > - Mini-batch: Added a mini-batch optimizer for MLP and a > > >>>> preprocessor > > >>>> > function > > >>>> > necessary to create batches from the data (MADLIB-1200, > > >>>> MADLIB-1206, > > >>>> > MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227) > > >>>> > - k-NN: Added weighted averaging/voting by distance > (MADLIB-1181) > > >>>> > - Summary: Added additional stats: number of positive, > negative, > > >>>> zero > > >>>> > values and > > >>>> > 95% confidence intervals for the mean (MADLIB-1167) > > >>>> > - Encode categorical: Updated to produce lower-case column > names > > >>>> when > > >>>> > possible > > >>>> > (MADLIB-1202) > > >>>> > - MLP: Added support for already one-hot encoded categorical > > >>>> dependent > > >>>> > variable > > >>>> > in a classification task (MADLIB-1222) > > >>>> > - Pagerank: Added option for personalized vertices that allows > > >>>> higher > > >>>> > weightage > > >>>> > for a subset of vertices which will have a higher jump > > probability > > >>>> as > > >>>> > compared to other vertices and a random surfer is more likely > to > > >>>> > jump to these personalization vertices (MADLIB-1084) > > >>>> > > > >>>> > Bug fixes: > > >>>> > > > >>>> > - Fixed issue with invalid calls of construct_array that led to > > >>>> > problems > > >>>> > in Postgresql 10 (MADLIB-1185) > > >>>> > - Added newline between file concatenation during PGXN install > > >>>> > (MADLIB-1194) > > >>>> > - Fixed upgrade issues in knn (MADLIB-1197) > > >>>> > - Added fix to ensure RF variable importance are always > > >>>> non-negative > > >>>> > - Fixed inconsistency in LDA output and improved usability > > >>>> > (MADLIB-1160, MADLIB-1201) > > >>>> > - Fixed MLP and RF predict for models trained in earlier > versions > > >>>> to > > >>>> > ensure missing optional parameters are given appropriate > default > > >>>> values > > >>>> > (MADLIB-1207) > > >>>> > - Fixed a scenario in DT where no features exist due > categorical > > >>>> > columns with single level being dropped led to the database > > >>>> crashing > > >>>> > - Fixed step size initialization in MLP based on learning rate > > >>>> policy > > >>>> > (MADLIB-1212) > > >>>> > - Fixed PCA issue that leads to failure when grouping column > is a > > >>>> TEXT > > >>>> > type (MADLIB-1215) > > >>>> > - Fixed cat levels output in DT when grouping is enabled > > >>>> (MADLIB-1218) > > >>>> > - Fixed and simplified initialization of model coefficients in > > MLP > > >>>> > - Removed source table dependency for predicting regression > > models > > >>>> in > > >>>> > MLP (MADLIB-1223) > > >>>> > - Print loss of first iteration in MLP (MADLIB-1228) > > >>>> > - Fixed MLP failure on GPDB 4.3 when verbose=True (MADLIB-1209) > > >>>> > - Fixed RF issue that showed up when var_importance=True with > no > > >>>> > continuous features (MADLIB-1219) > > >>>> > - Fixed DT/RF issue for null_as_category=True and grouping > > enabled > > >>>> > (MADLIB-1217) > > >>>> > > > >>>> > Other: > > >>>> > > > >>>> > - Reduced install-check runtime for PCA, DT, RF, elastic net > > >>>> > (MADLIB-1216) > > >>>> > - Added CentOS 7 PostgreSQL 9.6/10 docker files > > >>>> > > > >>>> > For additional information, please see: > > >>>> > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14 > > >>>> > > > >>>> > Here are the release artifact details: > > >>>> > > > >>>> > Source release tag to be voted on: rc/1.14-rc1, located here: > > >>>> > https://git-wip-us.apache.org/repos/asf?p=madlib.git;a=tag; > > >>>> > h=refs/tags/rc/1.14-rc1 > > >>>> > > > >>>> > Source release tarball can be retrieved from the following > > locations: > > >>>> > > > >>>> > Package: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-src.tar.gz > > >>>> > PGP Signature: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-src.tar.gz.asc > > >>>> > SHA512 Hash: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-src.tar.gz.sha512 > > >>>> > > > >>>> > Convenience binary packages can be retrieved from the following > > >>>> > locations: > > >>>> > > > >>>> > macOS: 10.* PostgreSQL 9.6 & 10.2 > > >>>> > > > >>>> > Package: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Darwin.dmg > > >>>> > PGP Signature: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Darwin.dmg.asc > > >>>> > SHA512 Hash: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Darwin.dmg.sha512 > > >>>> > > > >>>> > CentOS* GPDB 4.3.5+ > > >>>> > > > >>>> > Package: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm > > >>>> > PGP Signature: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm.asc > > >>>> > SHA512 Hash: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm.sha512 > > >>>> > > > >>>> > CentOS 6 &* GPDB 5.3.0, PostgreSQL 9.6 & 10.2 > > >>>> > > > >>>> > Package: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux.rpm > > >>>> > PGP Signature: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux.rpm.asc > > >>>> > SHA512 Hash: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/ > > >>>> > apache-madlib-1.14-bin-Linux.rpm.sha512 > > >>>> > > > >>>> > The PGP KEYS file used to validate the signature of the release > > >>>> artifacts > > >>>> > is available here: > > >>>> > https://dist.apache.org/repos/dist/dev/madlib/KEYS > > >>>> > > > >>>> > To help in tallying the vote, PMC members please be sure to > indicate > > >>>> > “(binding)” with the vote. > > >>>> > > > >>>> > [ ] +1 approve > > >>>> > [ ] +0 no opinion > > >>>> > [ ] -1 disapprove (and reason why) > > >>>> > > > >>>> > Regards, > > >>>> > Jingyi Mei > > >>>> > > > >>>> > Pivotal R&D Advanced Analytics > > >>>> > > > >>>> > > > >>>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> Rashmi Raghu, Ph.D. > > >> Pivotal Data Science > > >> > > > > > > > > > -- Rashmi Raghu, Ph.D. Pivotal Data Science
