Re: [HACKERS] pl/python improvements
On Dec 23, 2010, at 3:38 AM, Jan Urbański wrote: Oh, didn't know that. I see that it does some more fancy things, like defining a inheritance hierarchy for these exceptions and adding some more into the mix. Right, there were some cases that appeared to benefit from larger buckets than what the existing code classes provided. Also, some of the exceptions in there are strictly for py-postgresql/client-side things. The names I used are not really invented, they're just plpgsql condition names from http://www.postgresql.org/docs/current/static/errcodes-appendix.html with underscores changed to camel case. Also, since they're autogenerated from utils/errcodes.h they don't have any hierarchy, they just all inherit from SPIError. For the backend setting, I think this is quite appropriate. However, for pg-python, I had mixed feelings about this as I wanted to be able to leverage py-postgresql's hierarchy, but still have the projects independent. I ended up punting on this one by using a single error class, and forcing the user to compare the codes. =( Sticking Error to every one of them will result in things like SubstringErrorError, so I'm not really sold on that. There was some creativity applied to the names in postgresql.exceptions to accommodate for things like that. (Like no redundant Error) Basically I think more PL/Python users will be familiar with condition names as you use them in pl/pgsql than with the names from py-postgresql. I think that's fair assumption. In fact, I think that might make a good TODO for py-postgresql/pg-python. Provide a plpgsql-code-name to exception class mapping. cheers, jwp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On Thu, Dec 23, 2010 at 04:08, Jan Urbański wulc...@wulczer.org wrote: * providing custom exceptions for SPI errors, so you can catch only UniqueViolations and not have to muck around with SQLCODE py-postgresql already has a mapping from error codes to Python exceptions. I think it makes sense to re-use that, instead of inventing new names. https://github.com/jwp/py-postgresql/blob/v1.1/postgresql/exceptions.py It also follows the Python convention of ending exception classes with Error, so instead of UniqueViolation they have UniqueError, instead of InvalidTextRepresentation, they have TextRepresentationError Meanwhile the code is available at https://github.com/wulczer/postgres. You will find 10 branches there, 9 correspond to these features, and the plpython branch is the sum of them all. I tried building the plpython branch, but got an unrelated error. I didn't investigate further for now... make[3]: Entering directory `/home/marti/src/postgresql-py/src/backend/bootstrap' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o bootparse.o bootparse.c bootparse.y: In function ‘boot_yyparse’: bootparse.y:224:16: error: too few arguments to function ‘heap_create’ ../../../src/include/catalog/heap.h:37:17: note: declared here bootparse.y:249:16: error: too few arguments to function ‘heap_create_with_catalog’ ../../../src/include/catalog/heap.h:48:12: note: declared here make[3]: *** [bootparse.o] Error 1 Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On 23/12/10 12:16, Marti Raudsepp wrote: On Thu, Dec 23, 2010 at 04:08, Jan Urbański wulc...@wulczer.org wrote: * providing custom exceptions for SPI errors, so you can catch only UniqueViolations and not have to muck around with SQLCODE py-postgresql already has a mapping from error codes to Python exceptions. I think it makes sense to re-use that, instead of inventing new names. https://github.com/jwp/py-postgresql/blob/v1.1/postgresql/exceptions.py It also follows the Python convention of ending exception classes with Error, so instead of UniqueViolation they have UniqueError, instead of InvalidTextRepresentation, they have TextRepresentationError Oh, didn't know that. I see that it does some more fancy things, like defining a inheritance hierarchy for these exceptions and adding some more into the mix. The names I used are not really invented, they're just plpgsql condition names from http://www.postgresql.org/docs/current/static/errcodes-appendix.html with underscores changed to camel case. Also, since they're autogenerated from utils/errcodes.h they don't have any hierarchy, they just all inherit from SPIError. Sticking Error to every one of them will result in things like SubstringErrorError, so I'm not really sold on that. Basically I think more PL/Python users will be familiar with condition names as you use them in pl/pgsql than with the names from py-postgresql. Meanwhile the code is available at https://github.com/wulczer/postgres. You will find 10 branches there, 9 correspond to these features, and the plpython branch is the sum of them all. I tried building the plpython branch, but got an unrelated error. I didn't investigate further for now... make[3]: Entering directory `/home/marti/src/postgresql-py/src/backend/bootstrap' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o bootparse.o bootparse.c bootparse.y: In function ‘boot_yyparse’: bootparse.y:224:16: error: too few arguments to function ‘heap_create’ ../../../src/include/catalog/heap.h:37:17: note: declared here bootparse.y:249:16: error: too few arguments to function ‘heap_create_with_catalog’ ../../../src/include/catalog/heap.h:48:12: note: declared here make[3]: *** [bootparse.o] Error 1 I'm pretty sure it'll go away it you do make maintainer-clean or git clean -dfx after switching to the plpython branch. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On 08/12/10 22:41, Peter Eisentraut wrote: On tis, 2010-12-07 at 23:56 +0100, Jan Urbański wrote: Peter suggested having a mail/patch per feature and the way I intend to do that is instead of having a dozen branches, have one and after I'm done rebase it interactively to produce incremental patches that apply to master, each one implementing one feature. Fair enough if you want to do it that way, but I'd encourage you to just send in any self-contained features/changes that you have finished. I finished work on the PL/Python improvements. There 9 separate patches, with some interdependencies. The features implemented are: * general refactoring, dropping the global error state, using dynahash instead of PyDicts for procedure caching, etc * a validator function * executing SPI calls in subtransactions * starting explicit subtransactions to have atomic behaviour of multiple SPI calls * providing custom exceptions for SPI errors, so you can catch only UniqueViolations and not have to muck around with SQLCODE * invalidate functions accepting composite types if the type changes after the functions has been defined * traceback support * table functions (ability to return RECORD and SETOF RECORD) * using custom parsers for datatypes + an example application with dict-hstore parsing The dependencies are: refactor - validator - SPI in subxacts - explicit subxacts - custom exceptions for SPI - invalidate composites - tracebacks - table functions - custom parsers so everything depends on the refactoring patch, and the SPI changes are incremental. I will generate these patches and send them in in separate threads/add them to the commitfest app before Christmas (I hope). Meanwhile the code is available at https://github.com/wulczer/postgres. You will find 10 branches there, 9 correspond to these features, and the plpython branch is the sum of them all. From now on I will stop rebasing these branches, and instead just commit fixes to the appropriate branch and merge upwards (from refactor to everything, from spi-in-subxacts to explicit-subxacts and to custom-spi-exceptions, etc). I'll also try to independently maintain the plpython branch with the sum to make sure nothing has been left out as the feature branches get merged into mainline. Here's to hoping that this will not turn into a conflict hell. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On tis, 2010-12-07 at 23:56 +0100, Jan Urbański wrote: Peter suggested having a mail/patch per feature and the way I intend to do that is instead of having a dozen branches, have one and after I'm done rebase it interactively to produce incremental patches that apply to master, each one implementing one feature. Fair enough if you want to do it that way, but I'd encourage you to just send in any self-contained features/changes that you have finished. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pl/python improvements
Hi, no, no patch(es) yet. I'm going through plpython.c trying as best I can to improve things there. I'll have a patch (or patches) ready for the January commitfest, but I thought I'd open up a discussion already to spare me having to redo features because the way I attacked the problem is a dead end. Be warned, this ended up being a very long email... The code is on https://github.com/wulczer/postgres, in the plpython branch. I'll be rebasing it regularly, so don't be surprised by commit hashes changing. Currently the improvements are: * added a validator, so syntax errors are caught on function creation, and you don't pay compile overhead when first calling the function * use HTABs instead of Python dictionaries to cache procedures, maintaing two separate HTABs for procedures and triggers (this way you can use the procedure's OID as the hash key) * get rid of the global PLy_error_in_progress, hopefuly things that cause errors are reported immediately * use palloc(TopMemoryContext) instead of malloc() for things that need to live until the end of the session * execute SPI calls in a subtransaction, report errors back to Python as exceptions that can be caught etc. * do not prefix error messages with PL/Python, just use the error string as errmsg * if a function has composite type inputs, check if the underlying types have not changed when calling the function and recompile it if necessary (fixes a TODO item) * support tracebacks and make them look just like Python tracebacks (as much as possible that is) * fix plpy.error and plpy.fatal to actually raise a plpy.Error/plpy.Fatal exception, that can be caught etc. * remove volatile modifiers from variables (AIUI they are completely not necessary anywhere in plpython.c) * general small refactorings, removing unnecessary code, splitting big procedures into smaller ones Planned improvements include: * explicit subxact starting, so you can issue consecutive plpy.execute calls atomically * split plpython.c into smaller modules, like plpython_traceback.c, plpython_type_conversion.c, plpython_plpy.c etc. * variadic argument handling (is this possible for a non-plpgsql pl?) * multiple OUT parameters support (a TODO item, I think table function support meant to say that) * add a DB-API interface on top of SPI (a TODO item) Not sure if I'll make it in time with all of them, I have put them in more-or-less priority order. Now that you're excited, the questions. Q1 To check for composite types changing after the procedure I/O functions have been cached, I store the relid of the base relation for the type (typeidTypeRelid(desc-tdtypeid), where desc is the type's TupleDesc), as well as the xmin and tid of the relation's tuple from pg_class. Then when the function is invoked, for each composite parameter I fetch the relation's tuple from pg_class again and compare xmin and tid. This means that functions with composite parameters are slower, because there's a syscache search for each parameter for each invocation. I roughly measured the slowdown for a function with one composite param to be 4%. OTOH this works now: CREATE TABLE employee ( name text, basesalary integer, bonus integer ); INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10); CREATE OR REPLACE FUNCTION test_composite_table_input(e employee) RETURNS integer AS $$ return e['basesalary'] + e['bonus'] $$ LANGUAGE plpythonu; SELECT name, test_composite_table_input(employee.*) FROM employee; ALTER TABLE employee DROP bonus; ALTER TABLE employee ADD bonus integer; UPDATE employee SET bonus = 10; -- will recompile the function and recache I/O functions SELECT name, test_composite_table_input(employee.*) FROM employee; PL/Perl has the same problem, BTW. The question is: am I going the right way checking if the composite type is up-to-date? The TODO item says: (is this possible, easy, quick?). My answer is: I guess, kinda, dunno. Q2 The format of error messages changes, because the PL/Python prefix gets removed (I want raise plpy.Error(foo) to result in ERROR: plpy.ERROR: foo, not in PL/Python: plpy.ERROR: foo) and because of other things. Like for instance since plpy.error(msg) now just raises a plpy.Error exception, you will get ERROR: plpy.Error: msg and not just ERROR: msg. This potentially breaks clients parsing error messages, but I don't buy that argument really. Still, is it acceptable? Q3 Tracebacks! The way I implemented them (based on the patch linked from the TODO item) they look like this: CREATE FUNCTION nested_error() RETURNS text AS 'def fun1(): plpy.error(boom) def fun2(): fun1() def fun3(): fun2() fun3() return not reached ' LANGUAGE plpythonu; SELECT nested_error(); ERROR: plpy.Error: boom DETAIL: Traceback (most recent call last): PL/Python function nested_error, line 10, in module fun3() PL/Python function nested_error, line 8, in fun3 fun2() PL/Python function nested_error, line 5, in fun2
Re: [HACKERS] pl/python improvements
On Tuesday 07 December 2010 20:17:57 Jan Urbański wrote: Hi, no, no patch(es) yet. I'm going through plpython.c trying as best I can to improve things there. I'll have a patch (or patches) ready for the January commitfest, but I thought I'd open up a discussion already to spare me having to redo features because the way I attacked the problem is a dead end. Be warned, this ended up being a very long email... Nice to see improvements there. Currently the improvements are: * execute SPI calls in a subtransaction, report errors back to Python as exceptions that can be caught etc. Youre doing that unconditionally? I think the performance impact of this will be too severe... Subtransactions can get quite expensive - especially in the face of HS. * remove volatile modifiers from variables (AIUI they are completely not necessary anywhere in plpython.c) I havent read the code recently but they might be needed for local variables changed inside a PG_TRY catching an error and also accessed in the CATCH block or afterwards if youre not rethrowing. See the man page of longjmp. * explicit subxact starting, so you can issue consecutive plpy.execute calls atomically That likely obsoletes my comment from above. * split plpython.c into smaller modules, like plpython_traceback.c, plpython_type_conversion.c, plpython_plpy.c etc. * variadic argument handling (is this possible for a non-plpgsql pl?) * multiple OUT parameters support (a TODO item, I think table function support meant to say that) * add a DB-API interface on top of SPI (a TODO item) I am not hot on that as dbapi simply is too unspecified in too many areas (prepared statements anyone). Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On 07/12/10 21:33, Andres Freund wrote: On Tuesday 07 December 2010 20:17:57 Jan Urbański wrote: * execute SPI calls in a subtransaction, report errors back to Python as exceptions that can be caught etc. Youre doing that unconditionally? I think the performance impact of this will be too severe... Subtransactions can get quite expensive - especially in the face of HS. See http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg163004.html :( * explicit subxact starting, so you can issue consecutive plpy.execute calls atomically That likely obsoletes my comment from above. Not really, I'm afraid :( You can pay the subxact overhead once with explicit subxact starting, but you always have to pay it... with plpy.subxact(): plpy.execute(select 1) plpy.execute(select 2) plpy.execute(select 3) will start only 1 subxact, as opposed to 3. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On tis, 2010-12-07 at 20:17 +0100, Jan Urbański wrote: no, no patch(es) yet. I'm going through plpython.c trying as best I can to improve things there. I'll have a patch (or patches) ready for the January commitfest, but I thought I'd open up a discussion already to spare me having to redo features because the way I attacked the problem is a dead end. Be warned, this ended up being a very long email... This all sounds very nice (especially since I had many of the same items on my todo list to tackle after Christmas or so). But I think this would be massively simpler if you created a separate patch/branch/email thread for each feature or change. The code is on https://github.com/wulczer/postgres, in the plpython branch. I'll be rebasing it regularly, so don't be surprised by commit hashes changing. I think rebasing published repositories isn't encouraged. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On 12/07/2010 04:50 PM, Peter Eisentraut wrote: The code is on https://github.com/wulczer/postgres, in the plpython branch. I'll be rebasing it regularly, so don't be surprised by commit hashes changing. I think rebasing published repositories isn't encouraged. Indeed. See http://progit.org/book/ch3-6.html for example, which says: *Do not rebase commits that you have pushed to a public repository.* If you follow that guideline, you’ll be fine. If you don’t, people will hate you, and you’ll be scorned by friends and family. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python improvements
On 07/12/10 23:00, Andrew Dunstan wrote: On 12/07/2010 04:50 PM, Peter Eisentraut wrote: The code is on https://github.com/wulczer/postgres, in the plpython branch. I'll be rebasing it regularly, so don't be surprised by commit hashes changing. I think rebasing published repositories isn't encouraged. Indeed. See http://progit.org/book/ch3-6.html for example, which says: *Do not rebase commits that you have pushed to a public repository.* If you follow that guideline, you’ll be fine. If you don’t, people will hate you, and you’ll be scorned by friends and family. If someone is interested in merging from that branch, I can refrain from rebasing (just tell me). But I'm really using it just as a patch queue and pushing to github is a way having an online backup for the changes, so it's not really a public repository. It's meant for people who want to try out the code as it stands today, or see the changes that have been made and don't care about having to rebase their checkout themselves. I find developing that way much easier than merging from the master branch. If I get conflicts I get them on a single commit of mine, not on the merge commit, which makes them easier to resolve. See https://github.com/diaspora/diaspora/wiki/Git-Workflow for an example of that kind of workflow. Peter suggested having a mail/patch per feature and the way I intend to do that is instead of having a dozen branches, have one and after I'm done rebase it interactively to produce incremental patches that apply to master, each one implementing one feature. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers