[GitHub] madlib pull request #320: Fix false positive for Postgres 10+
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/320#discussion_r217260823 --- Diff: src/ports/postgres/cmake/PostgreSQLUtils.cmake --- @@ -82,8 +82,7 @@ function(determine_target_versions OUT_VERSIONS) set(VERSION "4.3ORCA") endif() elseif(${PORT_UC} STREQUAL "POSTGRESQL" AND -(${${PORT_UC}_VERSION_MAJOR} EQUAL 10 OR -${${PORT_UC}_VERSION_PATCH} GREATER 10)) --- End diff -- The intention was to check for `...VERSION_MAJOR >= 10`. Hence the two checks - there was, however a typo with the second version input as `...PATCH` instead of `...MAJOR`. The correction should be to replace the `_PATCH` with `_MAJOR`. ---
[GitHub] madlib pull request #316: Build: Disable AppendOnly if available
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/316#discussion_r217260530 --- Diff: src/ports/postgres/modules/utilities/control.py_in --- @@ -158,6 +159,61 @@ class MinWarning(ContextDecorator): format(oldMsgLevel=self.oldMsgLevel)) +class AOControl(ContextDecorator): + +""" +@brief: A wrapper that enables/disables the AO storage option +""" + +def __init__(self, enable=False): +self.to_enable = enable +self.was_ao_enabled = False +self.guc_exists = True +self.storage_options_dict = dict() + +def _parse_gp_default_storage_options(self, gp_default_storage_options_str): +""" Parse comma separated key=value pairs + +Example: + appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row +""" +self.storage_options_dict = extract_keyvalue_params(gp_default_storage_options_str) +self.storage_options_dict['appendonly'] = bool( +strtobool(self.storage_options_dict['appendonly'])) + +def _join_gp_defaut_storage_options(self): +return ','.join(['{0}={1}'.format(k, v) +for k, v in self.storage_options_dict.iteritems()]) + +def __enter__(self): +try: +_storage_options_str = plpy.execute( +"show gp_default_storage_options")[0]["gp_default_storage_options"] +self._parse_gp_default_storage_options(_storage_options_str) + +# Set APPENDONLY=False after backing up existing value +self.was_ao_enabled = self.storage_options_dict['appendonly'] +self.storage_options_dict['appendonly'] = self.to_enable +plpy.execute("set gp_default_storage_options={0}". + format(self._join_gp_defaut_storage_options())) +except plpy.SPIError: +self.guc_exists = False +finally: +return self + +def __exit__(self, *args): +if args and args[0]: +# an exception was raised in code. We return False so that any +# exception is re-raised after exit. The transaction will not +# commit leading to reset of parameter value. --- End diff -- The GUC value should also reset if transaction does not commit. Ideally, this context manager should be called at the highest level. However, as you mention, it's possible it's not used correctly, with the raised exception caught and ignored. I have updated the code to perform the reset before re-raising the exception. ---
[GitHub] madlib pull request #316: Build: Disable AppendOnly if available
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/316#discussion_r217260262 --- Diff: src/ports/postgres/modules/utilities/control.py_in --- @@ -158,6 +159,61 @@ class MinWarning(ContextDecorator): format(oldMsgLevel=self.oldMsgLevel)) +class AOControl(ContextDecorator): + +""" +@brief: A wrapper that enables/disables the AO storage option +""" + +def __init__(self, enable=False): +self.to_enable = enable +self.was_ao_enabled = False +self.guc_exists = True +self.storage_options_dict = dict() + +def _parse_gp_default_storage_options(self, gp_default_storage_options_str): +""" Parse comma separated key=value pairs + +Example: + appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row +""" +self.storage_options_dict = extract_keyvalue_params(gp_default_storage_options_str) +self.storage_options_dict['appendonly'] = bool( +strtobool(self.storage_options_dict['appendonly'])) + +def _join_gp_defaut_storage_options(self): +return ','.join(['{0}={1}'.format(k, v) +for k, v in self.storage_options_dict.iteritems()]) + +def __enter__(self): +try: +_storage_options_str = plpy.execute( +"show gp_default_storage_options")[0]["gp_default_storage_options"] +self._parse_gp_default_storage_options(_storage_options_str) + +# Set APPENDONLY=False after backing up existing value --- End diff -- Yes, the comment is incorrect. I will change it to "Set APPENDONLY to input value" ---
[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/319 @fmcquillan99 Yes > So this PR only affects GP 6+ ? > > It means that GP 4.3.x, GP 5 and all supported PG versions will continue to work as is, and use Eigen vectorization, if the underlying infra supports it? ---
[GitHub] madlib issue #320: Fix false positive for Postgres 10+
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/320 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/678/ ---
[GitHub] madlib pull request #320: Fix false positive for Postgres 10+
GitHub user d opened a pull request: https://github.com/apache/madlib/pull/320 Fix false positive for Postgres 10+ We used to mistake 9.3.24 as a higher version than Postgres 10 and stop matching it to the correct "port". This patch fixes that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/d/madlib jz-detect-pg10 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/320.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #320 commit bfa929b14b757f1ffc0d6bdf1259f476292eb274 Author: Jesse Zhang Date: 2018-09-13T01:53:35Z Fix false positive for Postgres 10+ We used to mistake 9.3.24 as a higher version than Postgres 10 and stop matching it to the correct "port". This patch fixes that. ---
[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/319 So this PR only affects GP 6+ ? It means that GP 4.3.x, GP 5 and all supported PG versions will continue to work as is, and use Eigen vectorization, if the underlying infra supports it? ---
[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/318#discussion_r217227453 --- Diff: src/madpack/diff_udf.sql --- @@ -11,71 +11,13 @@ RETURNS text AS $$ $$ LANGUAGE plpythonu; -CREATE OR REPLACE FUNCTION get_functions(schema_name text) +CREATE OR REPLACE FUNCTION get_functions(table_name text, schema_name text, + type_filter text) RETURNS VOID AS $$ import plpy plpy.execute(""" -CREATE TABLE functions_madlib_new_version AS -SELECT -"schema", "name", filter_schema("retype", 'madlib') retype, -filter_schema("argtypes", 'madlib') argtypes, "type" -FROM -( - -SELECT n.nspname as "schema", --- End diff -- Why was this piece of code removed ? ---
[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/319 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/677/ ---
[GitHub] madlib pull request #319: Allocator: Remove 16-byte alignment for pointers i...
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/319 Allocator: Remove 16-byte alignment for pointers in GP6 Findings: 1. MADlib performs a 16-byte alignment for pointers returned by palloc. 2. Postgres prepends a small (16 byte usually) header before every pointer which includes a. the memory context and b. the size of the memory allocation. 3. Greenplum 6+ tweaks that scheme a little: instead of the memory context, the header tracks a "shared header" which points to another struct with richer information (aside from the memory context). 4. Postgres calls MemoryContextContains both with the final func for an aggregate and finalize function for a windowed aggregate. 5. Currently Postgres always concludes that the datum from MADlib is allocated outside of the context, and makes an extra copy. In Greenplum, MemoryContextContains needs to dereference the shared header. This is a problem since the pointer has been shifted and the function is misinterpreting the header. In this commit, we disable the pointer alignment for GPDB 6+ to avoid failure in this check. Further, we also have to disable vectorization in Eigen since it does not work when pointers are not 16-byte aligned. Co-authored-by: Jesse Zhang Co-authored-by: Nandish Jayaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib bugfix/pointer_alignment_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #319 commit 5cd1b6b56f93b177df3cda93f18cbf76535b3a07 Author: Rahul Iyer Date: 2018-09-12T23:59:59Z Allocator: Remove 16-byte alignment for pointers in GP6 Findings: 1. MADlib performs a 16-byte alignment for pointers returned by palloc. 2. Postgres prepends a small (16 byte usually) header before every pointer which includes a. the memory context and b. the size of the memory allocation. 3. Greenplum 6+ tweaks that scheme a little: instead of the memory context, the header tracks a "shared header" which points to another struct with richer information (aside from the memory context). 4. Postgres calls MemoryContextContains both with the final func for an aggregate and finalize function for a windowed aggregate. 5. Currently Postgres always concludes that the datum from MADlib is allocated outside of the context, and makes an extra copy. In Greenplum, MemoryContextContains needs to dereference the shared header. This is a problem since the pointer has been shifted and the function is misinterpreting the header. In this commit, we disable the pointer alignment for GPDB 6+ to avoid failure in this check. Further, we also have to disable vectorization in Eigen since it does not work when pointers are not 16-byte aligned. Co-authored-by: Jesse Zhang Co-authored-by: Nandish Jayaram ---
[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation
Github user kaknikhil commented on the issue: https://github.com/apache/madlib/pull/318 I tested the utility with the new changes but I am still reviewing the code.Here are my observations so far: 1. I noticed that there were a few indentation changes. I am assuming that this will not affect upgrade testing. 1. We will run this utility before tagging the new release. So for ex if the previous release was 1.15 and the new release is 1.16, then the utility will be run when the version is 1.16-dev on the master branch. In this case, how should the utility be run? Should it be run against the master branch for the latest release? This is important because the branch name is used for the description message `Changelist for MADlib version old_branch/tag to new_branch/tag` ---
[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/318#discussion_r217152555 --- Diff: src/madpack/create_changelist.py --- @@ -0,0 +1,229 @@ +#!/usr/bin/python +# -- +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# -- + +# Create changelist for any two branches/tags + +# Prequisites: +# The old version has to be installed in the "madlib_old_vers" schema +# The new version has to be installed in the "madlib" (default) schema +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version) +# The current branch does not matter + +# Usage (must be executed in the src/madpack directory): +# python create_changelist.py + +# Example (should be equivalent to changelist_1.13_1.14.yaml): +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml + +import sys +import os + +database = sys.argv[1] +old_vers = sys.argv[2] +new_vers = sys.argv[3] +ch_filename = sys.argv[4] + +if os.path.exists(ch_filename): +print "{0} already exists".format(ch_filename) +raise SystemExit + +err1 = os.system("""psql {0} -l > /dev/null""".format(database)) +if err1 != 0: +print "Database {0} does not exist".format(old_vers) +raise SystemExit + +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null + """.format(database)) +if err1 != 0: +print "Schema {0} does not exist".format(old_vers) +raise SystemExit + +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null + """.format(database)) +if err1 != 0: +print "Schema {0} does not exist".format(new_vers) +raise SystemExit + --- End diff -- Can we add another error check to ensure that the `madlib_old_vers.schema()` is the same as `old_vers branch/tag`? This does assume that the branch/tag will always be a release tag. ---
[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/318#discussion_r217143101 --- Diff: src/madpack/create_changelist.py --- @@ -0,0 +1,229 @@ +#!/usr/bin/python +# -- +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# -- + +# Create changelist for any two branches/tags + +# Prequisites: +# The old version has to be installed in the "madlib_old_vers" schema +# The new version has to be installed in the "madlib" (default) schema +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version) +# The current branch does not matter + +# Usage (must be executed in the src/madpack directory): +# python create_changelist.py + +# Example (should be equivalent to changelist_1.13_1.14.yaml): +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml + +import sys +import os + +database = sys.argv[1] +old_vers = sys.argv[2] +new_vers = sys.argv[3] +ch_filename = sys.argv[4] + +if os.path.exists(ch_filename): +print "{0} already exists".format(ch_filename) +raise SystemExit + +err1 = os.system("""psql {0} -l > /dev/null""".format(database)) +if err1 != 0: +print "Database {0} does not exist".format(old_vers) +raise SystemExit + +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null + """.format(database)) +if err1 != 0: +print "Schema {0} does not exist".format(old_vers) --- End diff -- This error message prints the branch name instead of the schema name. Same with the previous and next error checks. Maybe we should change the var name to be more descriptive ---
[GitHub] madlib issue #316: Build: Disable AppendOnly if available
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/316 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/676/ ---
[GitHub] madlib issue #316: Build: Disable AppendOnly if available
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/316 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/675/ ---
[GitHub] madlib issue #316: Build: Disable AppendOnly if available
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/316 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/674/ ---