Re: [sqlalchemy] rs.cursor.fetchall is skipping the first record when stream_results is True
Hi Michael, Thank you very much for your response :) The problem is that numpy requires a list of tuples, just what the low-level cursor provides. I just wanted to avoid the conversion to dict and back again to a tuple. Thank you very much! 2013/4/29 Michael Bayer mike...@zzzcomputing.com On Apr 29, 2013, at 9:49 AM, Pau Tallada tall...@pic.es wrote: Hi, I have some very big datasets that we want to process in batches of reasonable size. In order to do that, we enable the 'stream_results' execution option (available in Postgresql) and we use the fetchmany method to retrieve the records in batches. With those records we must build a numpy array, but its constructor complains about the RowProxy wrapper. So I tried using the cursor directly to retrieve the selected rows, but the cursor skips the first one if 'stream_results' is enabled. whoa OK I just looked at what you're doing here, wondering if I was crazy or not. You're doing rs.cursor.fetchall() on the second run. You can't do that here; the mechanism of stream results requires that rows must be buffered, so in this case you're bypassing that mechanism. I copied your test without seeing that detail. If numpy doesn't recognize the dict interface of a RowProxy, then iterate dicts by passing it (dict(row) for row in result). -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out. -- -- Pau Tallada Crespí Dep. d'Astrofísica i Cosmologia Port d'Informació Científica (PIC) Tel: +34 93 586 8233 -- -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Are sqlalchemy queries a generator?
On Fri, Apr 26, 2013 at 9:09 PM, Claudio Freire klaussfre...@gmail.com wrote: On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com wrote: All attributes have to be expire-able and act as proxies for a database connection so I'm not really sure where to go with that.I'm not too thrilled about proposals to build in various alternate performance behaviors as the library starts to try to act in many different ways that the vast majority of users aren't even aware of, it increases complexity internally, produces vast amounts of new use cases to test and maintain, etc.I'm always willing to look at patches that are all winning, of course, so if you have some way to speed things up without breaking usage contracts and without major new complexity/brittleness I'd love to look at a pull request. I know, it's just a probe to see what kind of a speedup could be obtained by not having that getter's interference. You know... simply implementing InstrumentedAttribute in C could do the trick... In fact... I'm gonna try that... feel free! though you might be surprised, a C function that just calls out to all the same Python operations anyway is often only negligibly faster, not enough to make the extra complexity worth it. also if you're looking to help with C, I'd love to get the C extensions out in the Py3K version, we have a patch that's fallen out of date at http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and testing. Will look into that. The point of the C function is to be able to quickly bypass all that _supports_population and function call overheads. The getter is dead-simple, so its cost is dominated by CPython function call overheads, that are readily removable by re-implementing in C. It can reliably and quickly detect when instance_dict returns __dict__, too. Alright, I've got a POC C extension working (gotta profile it yet), although SQLAlchemy's weird injection of instance_dict forced me to some ugly hacks: class InstrumentedAttribute(QueryableAttribute): Class bound instrumented attribute which adds descriptor methods. def __set__(self, instance, value): self.impl.set(instance_state(instance), instance_dict(instance), value, None) def __delete__(self, instance): self.impl.delete(instance_state(instance), instance_dict(instance)) try: from sqlalchemy.cinstrumented import InstrumentedGetter __get__ = InstrumentedGetter(globals()) __get__.__name__ = '__get__' del InstrumentedGetter except ImportError: def __get__(self, instance, owner): if instance is None: return self dict_ = instance_dict(instance) if self._supports_population and self.key in dict_: return dict_[self.key] else: return self.impl.get(instance_state(instance),dict_) Thing is, doing the whole class in C makes no sense for set and delete, but it also complicates linking its instance_dict and instance_state to SA.attribute's. This way looks ugly, but it reacts immediately to changing those globals, so it does seem like the better option. Opinions (while I profile)? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Are sqlalchemy queries a generator?
On Apr 30, 2013, at 6:26 PM, Claudio Freire klaussfre...@gmail.com wrote: On Fri, Apr 26, 2013 at 9:09 PM, Claudio Freire klaussfre...@gmail.com wrote: On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com wrote: All attributes have to be expire-able and act as proxies for a database connection so I'm not really sure where to go with that.I'm not too thrilled about proposals to build in various alternate performance behaviors as the library starts to try to act in many different ways that the vast majority of users aren't even aware of, it increases complexity internally, produces vast amounts of new use cases to test and maintain, etc.I'm always willing to look at patches that are all winning, of course, so if you have some way to speed things up without breaking usage contracts and without major new complexity/brittleness I'd love to look at a pull request. I know, it's just a probe to see what kind of a speedup could be obtained by not having that getter's interference. You know... simply implementing InstrumentedAttribute in C could do the trick... In fact... I'm gonna try that... feel free! though you might be surprised, a C function that just calls out to all the same Python operations anyway is often only negligibly faster, not enough to make the extra complexity worth it. also if you're looking to help with C, I'd love to get the C extensions out in the Py3K version, we have a patch that's fallen out of date at http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and testing. Will look into that. The point of the C function is to be able to quickly bypass all that _supports_population and function call overheads. The getter is dead-simple, so its cost is dominated by CPython function call overheads, that are readily removable by re-implementing in C. It can reliably and quickly detect when instance_dict returns __dict__, too. Alright, I've got a POC C extension working (gotta profile it yet), although SQLAlchemy's weird injection of instance_dict forced me to some ugly hacks: class InstrumentedAttribute(QueryableAttribute): Class bound instrumented attribute which adds descriptor methods. def __set__(self, instance, value): self.impl.set(instance_state(instance), instance_dict(instance), value, None) def __delete__(self, instance): self.impl.delete(instance_state(instance), instance_dict(instance)) try: from sqlalchemy.cinstrumented import InstrumentedGetter __get__ = InstrumentedGetter(globals()) __get__.__name__ = '__get__' del InstrumentedGetter except ImportError: def __get__(self, instance, owner): if instance is None: return self dict_ = instance_dict(instance) if self._supports_population and self.key in dict_: return dict_[self.key] else: return self.impl.get(instance_state(instance),dict_) Thing is, doing the whole class in C makes no sense for set and delete, but it also complicates linking its instance_dict and instance_state to SA.attribute's. This way looks ugly, but it reacts immediately to changing those globals, so it does seem like the better option. Opinions (while I profile)? I'd want to see the whole thing, like what's up with that globals() call, etc. The instance_dict is shuttled around everywhere so that we aren't constantly pulling it from the given object; we have a system in place whereby the fact that instance_dict is object.__dict__ is not necessarily a given, and you can actually use some other system of getting at __dict__. It saved us on a huge number of function calls at some point to expand it out like that, as inconvenient as it is. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Are sqlalchemy queries a generator?
On Tue, Apr 30, 2013 at 8:06 PM, Michael Bayer mike...@zzzcomputing.com wrote: try: from sqlalchemy.cinstrumented import InstrumentedGetter __get__ = InstrumentedGetter(globals()) __get__.__name__ = '__get__' del InstrumentedGetter except ImportError: def __get__(self, instance, owner): if instance is None: return self dict_ = instance_dict(instance) if self._supports_population and self.key in dict_: return dict_[self.key] else: return self.impl.get(instance_state(instance),dict_) Thing is, doing the whole class in C makes no sense for set and delete, but it also complicates linking its instance_dict and instance_state to SA.attribute's. This way looks ugly, but it reacts immediately to changing those globals, so it does seem like the better option. Opinions (while I profile)? I'd want to see the whole thing, like what's up with that globals() call, etc. The instance_dict is shuttled around everywhere so that we aren't constantly pulling it from the given object; we have a system in place whereby the fact that instance_dict is object.__dict__ is not necessarily a given, and you can actually use some other system of getting at __dict__. It saved us on a huge number of function calls at some point to expand it out like that, as inconvenient as it is. I realize that. That globals call is exactly for that. Well, without going into the C code, the python code does: dict_ = instance_dict(instance) That under the hood means: dict_ = globals()['instance_dict'](instance) Well, globals there is bound syntactically in python code, but C code has no globals, and injecting instance_dict into cinstrumented's module dict sounded like extra complexity for no reason. Importing attributes from cinstrumented is also no good, since at the point the InstrumentedGetter is constructed, attribute isn't on sys.modules, and doing it on invocation would have meant extra overhead. So, that globals call is to mimic python's syntactic binding to the module's global dict, at import time, and be able to query the dict and find instance_dict no matter how it's modified later. Afterward, the getter works more or less like this (in C): def __get__(self, instance, owner): if instance is None: return self if self.cached_instance_dict is not None \ and self.cached_instance_dict is instance_dict \ and self.cached_supports_population \ and hasattr(instance, '__dict__'): return instance.__dict__[self.key] else: self.cached_supports_population = self._supports_population self.cached_instance_dict = None dict_ = instance_dict(instance) if dict_ is instance.__dict__: self.cached_instance_dict = instance_dict return self.impl.get(instance_state(instance), dict_) Well, in spite of being more complicated, those self.cache_blah things are really fast since they just compare pointers in C, and, more importantly, entity.column will invoke this code from CPython's eval loop (C) directly to the descriptor's getter (C), in no way incurring python's frame allocation overhead. I'm attaching the C module in case it clarifies. I'm not entirely sure about the garbage collection part yet... so it's not final. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out. /* instrumented.c Copyright (C) 2013 Claudio Freire klaussfre...@gmail.com This module is part of SQLAlchemy and is released under the MIT License: http://www.opensource.org/licenses/mit-license.php */ #include Python.h #if PY_VERSION_HEX 0x0205 !defined(PY_SSIZE_T_MIN) typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN typedef Py_ssize_t (*lenfunc)(PyObject *); #define PyInt_FromSsize_t(x) PyInt_FromLong(x) typedef intargfunc ssizeargfunc; #endif #if PY_VERSION_HEX = 0x0300 #define PyString_InternFromString PyUnicode_InternFromString #endif PyObject *get_string = NULL; PyObject *uget_string = NULL; /*** * Structs * ***/ typedef struct { PyObject_HEAD /* Where to get instance_dict from */ PyObject* globals; /* Name to which it was bound */ PyObject* name; /* non-reference, just a pointer for identity comparison */ void *cached_instance_dict; /* Only valid if cached_instance_dict != NULL and equal to global instance_dict */ int cached_supports_population; } InstrumentedGetter; /** * InstrumentedGetter * **/