Re: [PyGreSQL] pg.DB() pseudo-cursor to reduce RAM use

Christoph Zwerschke Sun, 22 Apr 2018 05:03:09 -0700

Am 26.03.2018 um 22:01 schrieb Justin Pryzby:
> I have one patch ready for evaluation.
>
> 1: implement fake/pseudo cursors in pg (same as what's in pgdb);
> I proposed not to support any "move()" function in pg, since
> there's no "fetch" (besides getItem which is newly introduced
> at your suggestion). This patch is solid.
>
> I also have additional patches which are "in progress" and less
> solid. I'll prepare after you after your evaluation of (1).

Thanks again for pushing this forward and sorry for taking so long toreply. Let's concentrate on patch 1 first, which is essentially aboutadding the sequence and iterator protocols to the classic pg query. Weshould really get this into version 5.1 as it makes a lot of sensesyntactically, even though memory-wise we don't get much benefits sinceit cannot prevent that the result list is always completely read intomemory (due to the way pqlib works). We can talk about server sidecursors later in version 5.2 or 6.


Some remarks:

* This doesn't work with Python 3 yet, but it's probably easy to fix.For instance, Py_TPFLAGS_HAVE_ITER does not exist in Python 3. We couldprovide it via py3c.h (would be simply 0 in Python 3).

* I don't think max_row and num_fields should be stored in thequeryObject, since they can easily be accessed as PQntuples(obj->result)and PQnfields(obj->result), which is actually only a cheap lookup.Better to have a single source of truth only. Storing col_types might bereasonable, but:

* testSetByteaEscaped and testSetDecimalPoint fail because the col_typesdepend on global settings. Tf you change these settings after executingthe query and before getting the result, the changes in the settings arenot reflected. Not sure if we should just adapt the test or not storecol_types and get them dynamically (see point above).

* TestDictIteratorQueries.testIterate fails, but this is easy to fix(the test is the problem, not the implementation)


* Needs some tests for raising IndexErrors (this works already)

* We should also implement sq._contains (__contains__) and __reversed__to make the sequence protocol more complete.


* We might want to deprecate ntuples() since it's the same as len() now.

* To be consistent with the old method names (dictresult andnamedresult), dictIter and namedIter should become dictiter andnamediter. If we really want nicer names, then it should be named_iterand dict_iter according to PEP8, but then we should also provideget_result, dict_result and named_result, as aliases and deprecated theold names. Not sure if we want that.

* Would it make sense to make the result row type configurable byproviding get/set_result_type() methods on the module, connection andquery level? We could then get rid of the 6 different accessor methodsfor the result result list/iter and simply call list() or iter() on thequery (or just iterate over it, or access items directly). Not surewhether this is a silly or clever idea, what do you think?

I'll clean up and implement your patch for 5.1, but want to clarify theabove points first.


-- Christoph
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo.cgi/pygresql

Re: [PyGreSQL] pg.DB() pseudo-cursor to reduce RAM use

Reply via email to