Hello Apache MADlib dev community, This is the vote for Apache MADlib 1.14 Release (RC1). It provides the source release tarball and convenience binaries. This is the third Apache MADlib release as an Apache Top Level Project (TLP).
The vote will run for at least 72 working hours and will close on Tuesday, May 1st, 2018 @ 6pm PDT. A minimum of 3 binding +1 votes and more binding +1 than binding -1 are required to pass. The main goals of this release are: New features: - New module - Balanced datasets: A sampling module to balance classification datasets by resampling using various techniques including undersampling, oversampling, uniform sampling or user-defined proportion sampling (MADLIB-1168) - Mini-batch: Added a mini-batch optimizer for MLP and a preprocessor function necessary to create batches from the data (MADLIB-1200, MADLIB-1206, MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227) - k-NN: Added weighted averaging/voting by distance (MADLIB-1181) - Summary: Added additional stats: number of positive, negative, zero values and 95% confidence intervals for the mean (MADLIB-1167) - Encode categorical: Updated to produce lower-case column names when possible (MADLIB-1202) - MLP: Added support for already one-hot encoded categorical dependent variable in a classification task (MADLIB-1222) - Pagerank: Added option for personalized vertices that allows higher weightage for a subset of vertices which will have a higher jump probability as compared to other vertices and a random surfer is more likely to jump to these personalization vertices (MADLIB-1084) Bug fixes: - Fixed issue with invalid calls of construct_array that led to problems in Postgresql 10 (MADLIB-1185) - Added newline between file concatenation during PGXN install (MADLIB-1194) - Fixed upgrade issues in knn (MADLIB-1197) - Added fix to ensure RF variable importance are always non-negative - Fixed inconsistency in LDA output and improved usability (MADLIB-1160, MADLIB-1201) - Fixed MLP and RF predict for models trained in earlier versions to ensure missing optional parameters are given appropriate default values (MADLIB-1207) - Fixed a scenario in DT where no features exist due categorical columns with single level being dropped led to the database crashing - Fixed step size initialization in MLP based on learning rate policy (MADLIB-1212) - Fixed PCA issue that leads to failure when grouping column is a TEXT type (MADLIB-1215) - Fixed cat levels output in DT when grouping is enabled (MADLIB-1218) - Fixed and simplified initialization of model coefficients in MLP - Removed source table dependency for predicting regression models in MLP (MADLIB-1223) - Print loss of first iteration in MLP (MADLIB-1228) - Fixed MLP failure on GPDB 4.3 when verbose=True (MADLIB-1209) - Fixed RF issue that showed up when var_importance=True with no continuous features (MADLIB-1219) - Fixed DT/RF issue for null_as_category=True and grouping enabled (MADLIB-1217) Other: - Reduced install-check runtime for PCA, DT, RF, elastic net (MADLIB-1216) - Added CentOS 7 PostgreSQL 9.6/10 docker files For additional information, please see: https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14 Here are the release artifact details: Source release tag to be voted on: rc/1.14-rc1, located here: https://git-wip-us.apache.org/repos/asf?p=madlib.git;a=tag;h=refs/tags/rc/1.14-rc1 Source release tarball can be retrieved from the following locations: Package: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-src.tar.gz PGP Signature: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-src.tar.gz.asc SHA512 Hash: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-src.tar.gz.sha512 Convenience binary packages can be retrieved from the following locations: macOS: 10.* PostgreSQL 9.6 & 10.2 Package: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Darwin.dmg PGP Signature: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Darwin.dmg.asc SHA512 Hash: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Darwin.dmg.sha512 CentOS* GPDB 4.3.5+ Package: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux-GPDB43.rpm PGP Signature: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux-GPDB43.rpm.asc SHA512 Hash: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux-GPDB43.rpm.sha512 CentOS 6 &* GPDB 5.3.0, PostgreSQL 9.6 & 10.2 Package: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux.rpm PGP Signature: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux.rpm.asc SHA512 Hash: https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/apache-madlib-1.14-bin-Linux.rpm.sha512 The PGP KEYS file used to validate the signature of the release artifacts is available here: https://dist.apache.org/repos/dist/dev/madlib/KEYS To help in tallying the vote, PMC members please be sure to indicate “(binding)” with the vote. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Regards, Jingyi Mei Pivotal R&D Advanced Analytics