On 11-10-15 07:28 PM, Jan Urbański wrote:
Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J

I found a few bugs (see my testing section below) that will need fixing + a few questions about the code


Overview & Feature Review
-----------
This patch adds cursor support to plpython.  SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once.  This is a good thing.  Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch.    This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach.   I think the feature is worth having


Usability Review
----------------------
 The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on.  I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move.  The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


Documentation Review
---------------------

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring.  That should be done as a seperate patch and
shouldn't be a barrier to committing this one.




Code Review
----------------

in PLy_cursor_plan line 4080
+     PG_TRY();
+     {
+         Portal        portal;
+         char       *volatile nulls;
+         volatile int j;
+
+         if (nargs > 0)
+             nulls = palloc(nargs * sizeof(char));
+         else
+             nulls = NULL;
+
+         for (j = 0; j < nargs; j++)
+         {
+             PyObject   *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444
 PLy_cursor(PyObject *self, PyObject *args)
+ {
+     char            *query;
+     PyObject        *plan;
+     PyObject           *planargs = NULL;
+
+     if (PyArg_ParseTuple(args, "s", &query))
+         return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()



Testing
-----------

I tested both python 2.6 and 3 on a Linux system



create or replace function x() returns text as $$
cur=None
try:
  with plpy.subtransaction():
    cur=plpy.cursor('select generate_series(1,1000)')
    rows=cur.fetch(10);
    plpy.execute('select f()')

except plpy.SPIError:
  rows=cur.fetch(10);
  return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG: server process (PID 3166) was terminated by signal 11: Segmentation fault

The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:
  with plpy.subtransaction():
    plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
    plan=plpy.prepare('select * FROM z')
    plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
    cur=plpy.cursor(plan)
    rows=cur.fetch(10)
    return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR:  TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT:  Traceback (most recent call last):
      PL/Python function "x1", line 9, in <module>
        cur=plpy.cursor(plan)
    PL/Python function "x1"
STATEMENT:  select x1()


I was expecting an error from the function just a bit more useful one.


Performance
-------------------
I did not do any specific performance testing but I don't see this patch as having any impact to performance



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to