[GitHub] madlib pull request #320: Fix false positive for Postgres 10+

2018-09-12 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/320#discussion_r217260823
  
--- Diff: src/ports/postgres/cmake/PostgreSQLUtils.cmake ---
@@ -82,8 +82,7 @@ function(determine_target_versions OUT_VERSIONS)
 set(VERSION "4.3ORCA")
 endif()
 elseif(${PORT_UC} STREQUAL "POSTGRESQL" AND
-(${${PORT_UC}_VERSION_MAJOR} EQUAL 10 OR
-${${PORT_UC}_VERSION_PATCH} GREATER 10))
--- End diff --

The intention was to check for `...VERSION_MAJOR >= 10`. Hence the two 
checks - there was, however a typo with the second version input as `...PATCH` 
instead of `...MAJOR`. The correction should be to replace the `_PATCH` with 
`_MAJOR`. 


---


[GitHub] madlib pull request #316: Build: Disable AppendOnly if available

2018-09-12 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/316#discussion_r217260530
  
--- Diff: src/ports/postgres/modules/utilities/control.py_in ---
@@ -158,6 +159,61 @@ class MinWarning(ContextDecorator):
  format(oldMsgLevel=self.oldMsgLevel))
 
 
+class AOControl(ContextDecorator):
+
+"""
+@brief: A wrapper that enables/disables the AO storage option
+"""
+
+def __init__(self, enable=False):
+self.to_enable = enable
+self.was_ao_enabled = False
+self.guc_exists = True
+self.storage_options_dict = dict()
+
+def _parse_gp_default_storage_options(self, 
gp_default_storage_options_str):
+""" Parse comma separated key=value pairs
+
+Example:
+ 
appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row
+"""
+self.storage_options_dict = 
extract_keyvalue_params(gp_default_storage_options_str)
+self.storage_options_dict['appendonly'] = bool(
+strtobool(self.storage_options_dict['appendonly']))
+
+def _join_gp_defaut_storage_options(self):
+return ','.join(['{0}={1}'.format(k, v)
+for k, v in self.storage_options_dict.iteritems()])
+
+def __enter__(self):
+try:
+_storage_options_str = plpy.execute(
+"show 
gp_default_storage_options")[0]["gp_default_storage_options"]
+self._parse_gp_default_storage_options(_storage_options_str)
+
+# Set APPENDONLY=False after backing up existing value
+self.was_ao_enabled = self.storage_options_dict['appendonly']
+self.storage_options_dict['appendonly'] = self.to_enable
+plpy.execute("set gp_default_storage_options={0}".
+ format(self._join_gp_defaut_storage_options()))
+except plpy.SPIError:
+self.guc_exists = False
+finally:
+return self
+
+def __exit__(self, *args):
+if args and args[0]:
+# an exception was raised in code. We return False so that any
+# exception is re-raised after exit. The transaction will not
+# commit leading to reset of parameter value.
--- End diff --

The GUC value should also reset if transaction does not commit. Ideally, 
this context manager should be called at the highest level. However, as you 
mention, it's possible it's not used correctly, with the raised exception 
caught and ignored. I have updated the code to perform the reset before 
re-raising the exception. 


---


[GitHub] madlib pull request #316: Build: Disable AppendOnly if available

2018-09-12 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/316#discussion_r217260262
  
--- Diff: src/ports/postgres/modules/utilities/control.py_in ---
@@ -158,6 +159,61 @@ class MinWarning(ContextDecorator):
  format(oldMsgLevel=self.oldMsgLevel))
 
 
+class AOControl(ContextDecorator):
+
+"""
+@brief: A wrapper that enables/disables the AO storage option
+"""
+
+def __init__(self, enable=False):
+self.to_enable = enable
+self.was_ao_enabled = False
+self.guc_exists = True
+self.storage_options_dict = dict()
+
+def _parse_gp_default_storage_options(self, 
gp_default_storage_options_str):
+""" Parse comma separated key=value pairs
+
+Example:
+ 
appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row
+"""
+self.storage_options_dict = 
extract_keyvalue_params(gp_default_storage_options_str)
+self.storage_options_dict['appendonly'] = bool(
+strtobool(self.storage_options_dict['appendonly']))
+
+def _join_gp_defaut_storage_options(self):
+return ','.join(['{0}={1}'.format(k, v)
+for k, v in self.storage_options_dict.iteritems()])
+
+def __enter__(self):
+try:
+_storage_options_str = plpy.execute(
+"show 
gp_default_storage_options")[0]["gp_default_storage_options"]
+self._parse_gp_default_storage_options(_storage_options_str)
+
+# Set APPENDONLY=False after backing up existing value
--- End diff --

Yes, the comment is incorrect. I will change it to "Set APPENDONLY to input 
value"


---


[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-12 Thread iyerr3
Github user iyerr3 commented on the issue:

https://github.com/apache/madlib/pull/319
  
@fmcquillan99 Yes

> So this PR only affects GP 6+ ?
> 
> It means that GP 4.3.x, GP 5 and all supported PG versions will continue 
to work as is, and use Eigen vectorization, if the underlying infra supports it?


---


[GitHub] madlib issue #320: Fix false positive for Postgres 10+

2018-09-12 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/320
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/678/



---


[GitHub] madlib pull request #320: Fix false positive for Postgres 10+

2018-09-12 Thread d
GitHub user d opened a pull request:

https://github.com/apache/madlib/pull/320

Fix false positive for Postgres 10+

We used to mistake 9.3.24 as a higher version than Postgres 10 and stop
matching it to the correct "port". This patch fixes that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/d/madlib jz-detect-pg10

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #320


commit bfa929b14b757f1ffc0d6bdf1259f476292eb274
Author: Jesse Zhang 
Date:   2018-09-13T01:53:35Z

Fix false positive for Postgres 10+

We used to mistake 9.3.24 as a higher version than Postgres 10 and stop
matching it to the correct "port". This patch fixes that.




---


[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/319
  
So this PR only affects GP 6+ ?

It means that GP 4.3.x, GP 5 and all supported PG versions will continue to 
work as is, and use Eigen vectorization, if the underlying infra supports it?


---


[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

2018-09-12 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/318#discussion_r217227453
  
--- Diff: src/madpack/diff_udf.sql ---
@@ -11,71 +11,13 @@ RETURNS text AS $$
 $$ LANGUAGE plpythonu;
 
 
-CREATE OR REPLACE FUNCTION get_functions(schema_name text)
+CREATE OR REPLACE FUNCTION get_functions(table_name text, schema_name text,
+ type_filter text)
 RETURNS VOID AS
 $$
 import plpy
 plpy.execute("""
-CREATE TABLE functions_madlib_new_version AS
-SELECT
-"schema", "name", filter_schema("retype", 'madlib') retype,
-filter_schema("argtypes", 'madlib') argtypes, "type"
-FROM
-(
-
-SELECT n.nspname as "schema",
--- End diff --

Why was this piece of code removed ? 


---


[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-12 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/319
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/677/



---


[GitHub] madlib pull request #319: Allocator: Remove 16-byte alignment for pointers i...

2018-09-12 Thread iyerr3
GitHub user iyerr3 opened a pull request:

https://github.com/apache/madlib/pull/319

Allocator: Remove 16-byte alignment for pointers in GP6

Findings:
1. MADlib performs a 16-byte alignment for pointers returned by palloc.
2. Postgres prepends a small (16 byte usually) header before every
pointer which includes
a. the memory context and
b. the size of the memory allocation.
3. Greenplum 6+ tweaks that scheme a little: instead of the memory context,
the header tracks a "shared header" which points to another struct with
richer information (aside from the memory context).
4. Postgres calls MemoryContextContains both with the final func
for an aggregate and finalize function for a windowed aggregate.
5. Currently Postgres always concludes that the datum from MADlib is
allocated outside of the context, and makes an extra copy. In
Greenplum, MemoryContextContains needs to dereference the shared header.
This is a problem since the pointer has been shifted and the function is
misinterpreting the header.

In this commit, we disable the pointer alignment for GPDB 6+ to avoid
failure in this check. Further, we also have to disable vectorization in
Eigen since it does not work when pointers are not 16-byte aligned.

Co-authored-by: Jesse Zhang 
Co-authored-by: Nandish Jayaram 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib bugfix/pointer_alignment_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #319


commit 5cd1b6b56f93b177df3cda93f18cbf76535b3a07
Author: Rahul Iyer 
Date:   2018-09-12T23:59:59Z

Allocator: Remove 16-byte alignment for pointers in GP6

Findings:
1. MADlib performs a 16-byte alignment for pointers returned by palloc.
2. Postgres prepends a small (16 byte usually) header before every
pointer which includes
a. the memory context and
b. the size of the memory allocation.
3. Greenplum 6+ tweaks that scheme a little: instead of the memory context,
the header tracks a "shared header" which points to another struct with
richer information (aside from the memory context).
4. Postgres calls MemoryContextContains both with the final func
for an aggregate and finalize function for a windowed aggregate.
5. Currently Postgres always concludes that the datum from MADlib is
allocated outside of the context, and makes an extra copy. In
Greenplum, MemoryContextContains needs to dereference the shared header.
This is a problem since the pointer has been shifted and the function is
misinterpreting the header.

In this commit, we disable the pointer alignment for GPDB 6+ to avoid
failure in this check. Further, we also have to disable vectorization in
Eigen since it does not work when pointers are not 16-byte aligned.

Co-authored-by: Jesse Zhang 
Co-authored-by: Nandish Jayaram 




---


[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

2018-09-12 Thread kaknikhil
Github user kaknikhil commented on the issue:

https://github.com/apache/madlib/pull/318
  
I tested the utility with the new changes but I am still reviewing the 
code.Here are my observations so far:

1. I noticed that there were a few indentation changes. I am assuming that 
this will not affect upgrade testing.
1. We will run this utility before tagging the new release. So for ex if 
the previous release was 1.15 and the new release is 1.16, then the utility 
will be run when the version is 1.16-dev on the master branch. In this case, 
how should the utility be run? Should it be run against the master branch for 
the latest release? This is important because the branch name is used for the 
description message `Changelist for MADlib version old_branch/tag to 
new_branch/tag`



---


[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

2018-09-12 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/318#discussion_r217152555
  
--- Diff: src/madpack/create_changelist.py ---
@@ -0,0 +1,229 @@
+#!/usr/bin/python
+# 
--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# 
--
+
+# Create changelist for any two branches/tags
+
+# Prequisites:
+# The old version has to be installed in the "madlib_old_vers" schema
+# The new version has to be installed in the "madlib" (default) schema
+# Two branches/tags must exist locally (run 'git fetch' to ensure you have 
the latest version)
+# The current branch does not matter
+
+# Usage (must be executed in the src/madpack directory):
+# python create_changelist.py
+
+# Example (should be equivalent to changelist_1.13_1.14.yaml):
+# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
+
+import sys
+import os
+
+database = sys.argv[1]
+old_vers = sys.argv[2]
+new_vers = sys.argv[3]
+ch_filename = sys.argv[4]
+
+if os.path.exists(ch_filename):
+print "{0} already exists".format(ch_filename)
+raise SystemExit
+
+err1 = os.system("""psql {0} -l > /dev/null""".format(database))
+if err1 != 0:
+print "Database {0} does not exist".format(old_vers)
+raise SystemExit
+
+err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > 
/dev/null
+ """.format(database))
+if err1 != 0:
+print "Schema {0} does not exist".format(old_vers)
+raise SystemExit
+
+err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
+ """.format(database))
+if err1 != 0:
+print "Schema {0} does not exist".format(new_vers)
+raise SystemExit
+
--- End diff --

Can we add another error check to ensure that the 
`madlib_old_vers.schema()` is the same as `old_vers branch/tag`? This does 
assume that the branch/tag will always be a release tag.


---


[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

2018-09-12 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/318#discussion_r217143101
  
--- Diff: src/madpack/create_changelist.py ---
@@ -0,0 +1,229 @@
+#!/usr/bin/python
+# 
--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# 
--
+
+# Create changelist for any two branches/tags
+
+# Prequisites:
+# The old version has to be installed in the "madlib_old_vers" schema
+# The new version has to be installed in the "madlib" (default) schema
+# Two branches/tags must exist locally (run 'git fetch' to ensure you have 
the latest version)
+# The current branch does not matter
+
+# Usage (must be executed in the src/madpack directory):
+# python create_changelist.py
+
+# Example (should be equivalent to changelist_1.13_1.14.yaml):
+# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
+
+import sys
+import os
+
+database = sys.argv[1]
+old_vers = sys.argv[2]
+new_vers = sys.argv[3]
+ch_filename = sys.argv[4]
+
+if os.path.exists(ch_filename):
+print "{0} already exists".format(ch_filename)
+raise SystemExit
+
+err1 = os.system("""psql {0} -l > /dev/null""".format(database))
+if err1 != 0:
+print "Database {0} does not exist".format(old_vers)
+raise SystemExit
+
+err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > 
/dev/null
+ """.format(database))
+if err1 != 0:
+print "Schema {0} does not exist".format(old_vers)
--- End diff --

This error message prints the branch name instead of the schema name. Same 
with the previous and next error checks. Maybe we should change the var name to 
be more descriptive


---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-09-12 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/676/



---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-09-12 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/675/



---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-09-12 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/674/



---