+1 approve
....
For reference, here are my notes for the Apache MADlib v1.14-rc1
review. There are two minor observations (** PLEASE REVIEW **) below.
Please let me know if additional information would help.
Regards,
-=e
--------------------------------------------------------------------------------
__ __ _ ____ _ _ _
| \/ | / \ | _ \| (_) |__
| |\/| | / _ \ | | | | | | '_ \
| | | |/ ___ \| |_| | | | |_) |
|_| |_/_/ \_\____/|_|_|_.__/
--------------------------------------------------------------------------------
Observations:
o Reviewing Jira status:
The outstanding jira list can be seen with the following:
https://issues.apache.org/jira/projects/MADLIB/versions/12342305
0 Warnings
34 Issues in version
33 Issues done
0 Issues in progress
1 Issues to do
** PLEASE REVIEW **
There is an open jira remaining with a v1.14 fix version
https://issues.apache.org/jira/browse/MADLIB-1048'
o PGP signature verified
o SHA512 checksum verified
o Release notes reviewed and look good
o Copyright is good in NOTICE file
o Apache RAT passed (mvn apache-rat:check)
v1.14 - [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0
generated: 0 approved: 118 licence.
o Docs review (generation only, no content review):
- Generated design.pdf
- Generated html user docs
o Source validation
- Mac OS X
ProductName: Mac OS X
ProductVersion: 10.13.4
BuildVersion: 17E202
PostgreSQL 10.3 & 9.6.8 (built from source)
MADlib v1.14-rc1 (built from source)
** PLEASE REVIEW **
For the Mac dev environment, I used latest Homebrew to setup
environment. I originally used the Homebrew version of Boost
(1.67.0) and encountered compilation issues.
[ 86%] Building CXX object
src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/elastic_net/elastic_net_binomial_fista.cpp.o
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/crf/viterbi.cpp:12:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' fileIn file included from
not/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/elastic_net/elastic_net_binomial_fista.cpp
:found2
:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' file not found#include
<boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/convex/lmf_igd.cpp:9:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' file not found
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/crf/linear_crf.cpp:11:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hppIn
file included from
:/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/convex/linear_svm_igd.cpp88::910:
: fatal
error/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:
:88:10
: fatal error'boost/tr1/array.hpp': file not found
'boost/tr1/array.hpp' file not found
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/assoc_rules/assoc_rules.cpp:11:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' file not found
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/convex/utils_regularization.cpp:6:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' file not found
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
In file included from
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/modules/convex/mlp_igd.cpp:26:
/Users/espino/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/src/ports/postgres/dbconnector/dbconnector.hpp:88:10:
fatal error:
'boost/tr1/array.hpp' file not found
#include <boost/tr1/array.hpp>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
1 error generated.
1 error generated.
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/crf/viterbi.cpp.o]
Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/elastic_net/elastic_net_binomial_fista.cpp.o]
Error 1
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/convex/lmf_igd.cpp.o]
Error 1
1 error generated.
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/assoc_rules/assoc_rules.cpp.o]
Error 1
1 error generated.
1 error generated.
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/convex/linear_svm_igd.cpp.o]
Error 1
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/crf/linear_crf.cpp.o]
Error 1
1 error generated.
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/convex/utils_regularization.cpp.o]
Error 1
1 error generated.
make[2]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/__/__/__/modules/convex/mlp_igd.cpp.o]
Error 1
make[1]: ***
[src/ports/postgres/10/CMakeFiles/madlib_postgresql_10.dir/all] Error 2
make: *** [all] Error 2
✘-2
~/workspace/madlib/madlib-v1.14-rc1/apache-madlib-1.14-src/build
12:16 $
I un-installed the Homebrew version of boost and allowed the
build process to download a compatible Boost.
install-check passed for the two DB platforms.
- with Boost downloaded through build process
- CentOS release 6.9 (Final)
Run in Google Cloud Platform (GCP)
image: centos-6-v20180401
Dev packages installed to build PostgreSQL and MADlib from
source:
bison
cmake
flex
gcc
gcc-c++
patch
python-devel
readline-devel
zlib-devel
PostgreSQL 10.3 (built from source)
Greenplum Database:
greenplum-db-4.3.24.0-rhel5-x86_64.zip (pre-built downloaded from
PivNet)
greenplum-db-5.7.0-rhel6-x86_64.zip (pre-built downloaded from
PivNet)
MADlib v1.14-rc1 (built from source)
install-check passed for the three DB platforms.
On Thu, Apr 26, 2018 at 2:57 PM, Jingyi Mei <[email protected]> wrote:
> Hello Apache MADlib dev community,
>
> This is the vote for Apache MADlib 1.14 Release (RC1). It provides the
> source release tarball and convenience binaries. This is the third
> Apache MADlib release as an Apache Top Level Project (TLP).
>
> The vote will run for at least 72 working hours and will close on
> Tuesday, May 1st, 2018 @ 6pm PDT. A minimum of 3 binding +1 votes and
> more binding +1 than binding -1 are required to pass.
>
> The main goals of this release are:
>
> New features:
>
> - New module - Balanced datasets: A sampling module to balance
> classification
> datasets by resampling using various techniques including undersampling,
> oversampling, uniform sampling or user-defined proportion sampling
> (MADLIB-1168)
> - Mini-batch: Added a mini-batch optimizer for MLP and a preprocessor
> function
> necessary to create batches from the data (MADLIB-1200, MADLIB-1206,
> MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227)
> - k-NN: Added weighted averaging/voting by distance (MADLIB-1181)
> - Summary: Added additional stats: number of positive, negative, zero
> values and
> 95% confidence intervals for the mean (MADLIB-1167)
> - Encode categorical: Updated to produce lower-case column names when
> possible
> (MADLIB-1202)
> - MLP: Added support for already one-hot encoded categorical dependent
> variable
> in a classification task (MADLIB-1222)
> - Pagerank: Added option for personalized vertices that allows higher
> weightage
> for a subset of vertices which will have a higher jump probability as
> compared to other vertices and a random surfer is more likely to
> jump to these personalization vertices (MADLIB-1084)
>
> Bug fixes:
>
> - Fixed issue with invalid calls of construct_array that led to problems
> in Postgresql 10 (MADLIB-1185)
> - Added newline between file concatenation during PGXN install
> (MADLIB-1194)
> - Fixed upgrade issues in knn (MADLIB-1197)
> - Added fix to ensure RF variable importance are always non-negative
> - Fixed inconsistency in LDA output and improved usability (MADLIB-1160,
> MADLIB-1201)
> - Fixed MLP and RF predict for models trained in earlier versions to
> ensure missing optional parameters are given appropriate default values
> (MADLIB-1207)
> - Fixed a scenario in DT where no features exist due categorical columns
> with single level being dropped led to the database crashing
> - Fixed step size initialization in MLP based on learning rate policy
> (MADLIB-1212)
> - Fixed PCA issue that leads to failure when grouping column is a TEXT
> type (MADLIB-1215)
> - Fixed cat levels output in DT when grouping is enabled (MADLIB-1218)
> - Fixed and simplified initialization of model coefficients in MLP
> - Removed source table dependency for predicting regression models in
> MLP (MADLIB-1223)
> - Print loss of first iteration in MLP (MADLIB-1228)
> - Fixed MLP failure on GPDB 4.3 when verbose=True (MADLIB-1209)
> - Fixed RF issue that showed up when var_importance=True with no
> continuous features (MADLIB-1219)
> - Fixed DT/RF issue for null_as_category=True and grouping enabled
> (MADLIB-1217)
>
> Other:
>
> - Reduced install-check runtime for PCA, DT, RF, elastic net
> (MADLIB-1216)
> - Added CentOS 7 PostgreSQL 9.6/10 docker files
>
> For additional information, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14
>
> Here are the release artifact details:
>
> Source release tag to be voted on: rc/1.14-rc1, located here:
> https://git-wip-us.apache.org/repos/asf?p=madlib.git;a=tag;
> h=refs/tags/rc/1.14-rc1
>
> Source release tarball can be retrieved from the following locations:
>
> Package:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-src.tar.gz
> PGP Signature:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-src.tar.gz.asc
> SHA512 Hash:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-src.tar.gz.sha512
>
> Convenience binary packages can be retrieved from the following
> locations:
>
> macOS: 10.* PostgreSQL 9.6 & 10.2
>
> Package:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Darwin.dmg
> PGP Signature:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Darwin.dmg.asc
> SHA512 Hash:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Darwin.dmg.sha512
>
> CentOS* GPDB 4.3.5+
>
> Package:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux-GPDB43.rpm
> PGP Signature:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux-GPDB43.rpm.asc
> SHA512 Hash:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux-GPDB43.rpm.sha512
>
> CentOS 6 &* GPDB 5.3.0, PostgreSQL 9.6 & 10.2
>
> Package:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux.rpm
> PGP Signature:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux.rpm.asc
> SHA512 Hash:
> https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> apache-madlib-1.14-bin-Linux.rpm.sha512
>
> The PGP KEYS file used to validate the signature of the release artifacts
> is available here:
> https://dist.apache.org/repos/dist/dev/madlib/KEYS
>
> To help in tallying the vote, PMC members please be sure to indicate
> “(binding)” with the vote.
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Regards,
> Jingyi Mei
>
> Pivotal R&D Advanced Analytics
>
>