Re: [sqlalchemy] rs.cursor.fetchall is skipping the first record when stream_results is True

2013-04-30 Thread Pau Tallada
Hi Michael,

Thank you very much for your response :)

The problem is that numpy requires a list of tuples, just what the
low-level cursor provides.
I just wanted to avoid the conversion to dict and back again to a tuple.

Thank you very much!

2013/4/29 Michael Bayer mike...@zzzcomputing.com


 On Apr 29, 2013, at 9:49 AM, Pau Tallada tall...@pic.es wrote:

  Hi,
 
  I have some very big datasets that we want to process in batches of
 reasonable size.
  In order to do that, we enable the 'stream_results' execution option
 (available in Postgresql) and we use the fetchmany method to retrieve the
 records in batches.
  With those records we must build a numpy array, but its constructor
 complains about the RowProxy wrapper.
 
  So I tried using the cursor directly to retrieve the selected rows, but
 the cursor skips the first one if 'stream_results' is enabled.

 whoa OK I just looked at what you're doing here, wondering if I was
 crazy or not.   You're doing rs.cursor.fetchall() on the second run.
  You can't do that here; the mechanism of stream results requires that
 rows must be buffered, so in this case you're bypassing that mechanism.
  I copied your test without seeing that detail.

 If numpy doesn't recognize the dict interface of a RowProxy, then iterate
 dicts by passing it (dict(row) for row in result).



 --
 You received this message because you are subscribed to the Google Groups
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
 For more options, visit https://groups.google.com/groups/opt_out.





-- 
--
Pau Tallada Crespí
Dep. d'Astrofísica i Cosmologia
Port d'Informació Científica (PIC)
Tel: +34 93 586 8233
--

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-30 Thread Claudio Freire
On Fri, Apr 26, 2013 at 9:09 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways 
 that the vast majority of users aren't even aware of, it increases 
 complexity internally, produces vast amounts of new use cases to test 
 and maintain, etc.I'm always willing to look at patches that are all 
 winning, of course, so if you have some way to speed things up without 
 breaking usage contracts and without major new complexity/brittleness 
 I'd love to look at a pull request.

 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...


 In fact... I'm gonna try that...

 feel free!  though you might be surprised, a C function that just calls out 
 to all the same Python operations anyway is often only negligibly faster, 
 not enough to make the extra complexity worth it.

 also if you're looking to help with C, I'd love to get the C extensions out 
 in the Py3K version, we have a patch that's fallen out of date at 
 http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and 
 testing.

 Will look into that. The point of the C function is to be able to
 quickly bypass all that _supports_population and function call
 overheads. The getter is dead-simple, so its cost is dominated by
 CPython function call overheads, that are readily removable by
 re-implementing in C. It can reliably and quickly detect when
 instance_dict returns __dict__, too.

Alright, I've got a POC C extension working (gotta profile it yet),
although SQLAlchemy's weird injection of instance_dict forced me to
some ugly hacks:


class InstrumentedAttribute(QueryableAttribute):
Class bound instrumented attribute which adds descriptor methods.

def __set__(self, instance, value):
self.impl.set(instance_state(instance),
instance_dict(instance), value, None)

def __delete__(self, instance):
self.impl.delete(instance_state(instance), instance_dict(instance))

try:
from sqlalchemy.cinstrumented import InstrumentedGetter
__get__ = InstrumentedGetter(globals())
__get__.__name__ = '__get__'
del InstrumentedGetter
except ImportError:
def __get__(self, instance, owner):
if instance is None:
return self

dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_)

Thing is, doing the whole class in C makes no sense for set and
delete, but it also complicates linking its instance_dict and
instance_state to SA.attribute's.

This way looks ugly, but it reacts immediately to changing those
globals, so it does seem like the better option.

Opinions (while I profile)?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-30 Thread Michael Bayer

On Apr 30, 2013, at 6:26 PM, Claudio Freire klaussfre...@gmail.com wrote:

 On Fri, Apr 26, 2013 at 9:09 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways 
 that the vast majority of users aren't even aware of, it increases 
 complexity internally, produces vast amounts of new use cases to test 
 and maintain, etc.I'm always willing to look at patches that are 
 all winning, of course, so if you have some way to speed things up 
 without breaking usage contracts and without major new 
 complexity/brittleness I'd love to look at a pull request.
 
 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...
 
 
 In fact... I'm gonna try that...
 
 feel free!  though you might be surprised, a C function that just calls 
 out to all the same Python operations anyway is often only negligibly 
 faster, not enough to make the extra complexity worth it.
 
 also if you're looking to help with C, I'd love to get the C extensions out 
 in the Py3K version, we have a patch that's fallen out of date at 
 http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and 
 testing.
 
 Will look into that. The point of the C function is to be able to
 quickly bypass all that _supports_population and function call
 overheads. The getter is dead-simple, so its cost is dominated by
 CPython function call overheads, that are readily removable by
 re-implementing in C. It can reliably and quickly detect when
 instance_dict returns __dict__, too.
 
 Alright, I've got a POC C extension working (gotta profile it yet),
 although SQLAlchemy's weird injection of instance_dict forced me to
 some ugly hacks:
 
 
 class InstrumentedAttribute(QueryableAttribute):
Class bound instrumented attribute which adds descriptor methods.
 
def __set__(self, instance, value):
self.impl.set(instance_state(instance),
instance_dict(instance), value, None)
 
def __delete__(self, instance):
self.impl.delete(instance_state(instance), instance_dict(instance))
 
try:
from sqlalchemy.cinstrumented import InstrumentedGetter
__get__ = InstrumentedGetter(globals())
__get__.__name__ = '__get__'
del InstrumentedGetter
except ImportError:
def __get__(self, instance, owner):
if instance is None:
return self
 
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_)
 
 Thing is, doing the whole class in C makes no sense for set and
 delete, but it also complicates linking its instance_dict and
 instance_state to SA.attribute's.
 
 This way looks ugly, but it reacts immediately to changing those
 globals, so it does seem like the better option.
 
 Opinions (while I profile)?

I'd want to see the whole thing, like what's up with that globals() call, etc.  
 The instance_dict is shuttled around everywhere so that we aren't constantly 
pulling it from the given object; we have a system in place whereby the fact 
that instance_dict is object.__dict__ is not necessarily a given, and you can 
actually use some other system of getting at __dict__.   It saved us on a huge 
number of function calls at some point to expand it out like that, as 
inconvenient as it is.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-30 Thread Claudio Freire
On Tue, Apr 30, 2013 at 8:06 PM, Michael Bayer mike...@zzzcomputing.com wrote:
try:
from sqlalchemy.cinstrumented import InstrumentedGetter
__get__ = InstrumentedGetter(globals())
__get__.__name__ = '__get__'
del InstrumentedGetter
except ImportError:
def __get__(self, instance, owner):
if instance is None:
return self

dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_)

 Thing is, doing the whole class in C makes no sense for set and
 delete, but it also complicates linking its instance_dict and
 instance_state to SA.attribute's.

 This way looks ugly, but it reacts immediately to changing those
 globals, so it does seem like the better option.

 Opinions (while I profile)?

 I'd want to see the whole thing, like what's up with that globals() call, 
 etc.   The instance_dict is shuttled around everywhere so that we aren't 
 constantly pulling it from the given object; we have a system in place 
 whereby the fact that instance_dict is object.__dict__ is not necessarily a 
 given, and you can actually use some other system of getting at __dict__.   
 It saved us on a huge number of function calls at some point to expand it out 
 like that, as inconvenient as it is.

I realize that. That globals call is exactly for that.

Well, without going into the C code, the python code does:

dict_ = instance_dict(instance)

That under the hood means:

dict_ = globals()['instance_dict'](instance)

Well, globals there is bound syntactically in python code, but C code
has no globals, and injecting instance_dict into cinstrumented's
module dict sounded like extra complexity for no reason.

Importing attributes from cinstrumented is also no good, since at the
point the InstrumentedGetter is constructed, attribute isn't on
sys.modules, and doing it on invocation would have meant extra
overhead.

So, that globals call is to mimic python's syntactic binding to the
module's global dict, at import time, and be able to query the dict
and find instance_dict no matter how it's modified later.

Afterward, the getter works more or less like this (in C):

def __get__(self, instance, owner):
if instance is None:
   return self
if self.cached_instance_dict is not None \
   and self.cached_instance_dict is instance_dict \
   and self.cached_supports_population \
   and hasattr(instance, '__dict__'):
return instance.__dict__[self.key]
else:
   self.cached_supports_population = self._supports_population
   self.cached_instance_dict = None
   dict_ = instance_dict(instance)
   if dict_ is instance.__dict__:
   self.cached_instance_dict = instance_dict
   return self.impl.get(instance_state(instance), dict_)

Well, in spite of being more complicated, those self.cache_blah things
are really fast since they just compare pointers in C, and, more
importantly, entity.column will invoke this code from CPython's eval
loop (C) directly to the descriptor's getter (C), in no way incurring
python's frame allocation overhead.

I'm attaching the C module in case it clarifies.

I'm not entirely sure about the garbage collection part yet... so it's
not final.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


/*
instrumented.c
Copyright (C) 2013 Claudio Freire klaussfre...@gmail.com

This module is part of SQLAlchemy and is released under
the MIT License: http://www.opensource.org/licenses/mit-license.php
*/

#include Python.h

#if PY_VERSION_HEX  0x0205  !defined(PY_SSIZE_T_MIN)
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
typedef Py_ssize_t (*lenfunc)(PyObject *);
#define PyInt_FromSsize_t(x) PyInt_FromLong(x)
typedef intargfunc ssizeargfunc;
#endif

#if PY_VERSION_HEX = 0x0300
#define PyString_InternFromString PyUnicode_InternFromString
#endif


PyObject *get_string = NULL;
PyObject *uget_string = NULL;

/***
 * Structs *
 ***/

typedef struct {
PyObject_HEAD

/* Where to get instance_dict from */
PyObject* globals;

/* Name to which it was bound */
PyObject* name;

/* non-reference, just a pointer for identity comparison */
void *cached_instance_dict;

/* Only valid if cached_instance_dict != NULL and equal to global instance_dict */
int cached_supports_population;
} InstrumentedGetter;

/**
 * InstrumentedGetter *
 **/