On 11-11-23 01:58 PM, Jan Urbański wrote:
On 20/11/11 19:14, Steve Singer wrote:
On 11-10-15 07:28 PM, Jan Urbański wrote:
Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J

I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code

Responding now to all questions and attaching a revised patch based on your comments.


I've looked over the revised version of the patch and it now seems fine.

Ready for committer.



Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or
plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is
worth
some opinions on. I like cursor() but if anyone disagrees now is the time.

We use plpy.subtransaction() to create Subxact objects, so I though plpy.cursor() would be most appropriate.

This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.

My idea is to add keyword arguments to plpy.cursor() that will allow you to decide whether you want a scrollable cursor and after that provide a move() method.

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database
access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.

Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet summoned force to overhaul them.

in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;

I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

It looked like you could drop volatile there (and in PLy_spi_execute_plan, where this is copied from (did I mention there's quite some code duplication in PL/Python?)) but digging in git I found this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000

that added the original volatile qualification, so I guess there's a reason.

line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, "s", &query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()

No, PyArg_ParseTuple returns a string on the stack, I check that repeatedly creating a cursor with a plan argument does not leak memory and that adding PyMem_Free there promptly leads to a segfault.


I tested both python 2.6 and 3 on a Linux system

[test cases demonstrating bugs]

Turns out it's a really bad idea to store pointers to Portal structures, because they get invalidated by the subtransaction abort hooks.

I switched to storing the cursor name and looking it up in the appropriate hash table every time it's used. The examples you sent (which I included as regression tests) now cause a ValueError to be raised with a message stating that the cursor has been created in an aborted subtransaction.

Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan




Reply via email to