[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

2010-07-19 Thread Adrian Price-Whelan
Hi,

I was just wondering at first whether there was a known issue with
ARRAY types, but here is the query:
spectra =
session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
== spPlateFilename).all()

It should return ~1000 objects equaling about 120MB of data. In
Python, this query takes 10 minutes to complete,  but as a SQL query
(copying and pasting the echo'd command) it takes a few seconds:
SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'

autoflush and autocommit are both set to False.

It seems like a straightforward query so I'm confused as to what could
be getting hung up.

Thanks for any insight,

Adrian

On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote:
 You absolutely need to turn in echoing and locate the specific SQL query 
 which causes the issue.   Queries can take excessive time for a very wide 
 variety of reasons.

 On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:



  Hello --

  I'm working with a database populated with data originally from a file 
  structure of files that are ~150MB each. We are dealing with a lot of data 
  that is being stored in the database using the 'ARRAY' datatype, 
  specifically numeric[]. After loading some of the data into the database I 
  tried performing a query to get back some data, and comparing it with code 
  that reads directly from the file system - but the database query took ~50 
  times longer to complete. For instance, to retrieve 100 records that 
  contain a few 4000 element arrays each using the code that reads the 
  filesystem it took less than a second, but the query on the database took 
  around 25 seconds to complete. Has anyone else had issues with array types 
  slowing down queries or does this sound more like another issue?

  Thanks!
  Adrian

  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalch...@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

2010-07-19 Thread Adrian Price-Whelan
 does it take a few seconds to fully fetch all the results and it only gets 
 1000 rows ?  or is that just to get the initial result?

I'm not sure what you mean by this - the query does return 1000 rows.

 also if any of the individual columns are very large BLOBs or perhaps very 
 large PG arrays that would add to the overhead of a fetch.

There definitely are columns of PG arrays ~4000 elements each, so back
to my first email it seems like the culprit here could be the ARRAY's

Thanks for your help,

Adrian

On Jul 19, 10:10 am, Michael Bayer mike...@zzzcomputing.com wrote:
 On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:





  Hi,

  I was just wondering at first whether there was a known issue with
  ARRAY types, but here is the query:
  spectra =
  session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
  == spPlateFilename).all()

  It should return ~1000 objects equaling about 120MB of data. In
  Python, this query takes 10 minutes to complete,  but as a SQL query
  (copying and pasting the echo'd command) it takes a few seconds:
  SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
  spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
  spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
  spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
  spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
  spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
  spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
  WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'

  autoflush and autocommit are both set to False.

  It seems like a straightforward query so I'm confused as to what could
  be getting hung up.

 does it take a few seconds to fully fetch all the results and it only gets 
 1000 rows ?  or is that just to get the initial result? these are different 
 things.   also if any of the individual columns are very large BLOBs or 
 perhaps very large PG arrays that would add to the overhead of a fetch.   You 
 can also try writing a DBAPI-only script that runs the query, as well as 
 running engine.execute(myquery.statement) and fetching rows that way to see 
 if some in-object process is the factor (which is unlikely).





  Thanks for any insight,

  Adrian

  On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote:
  You absolutely need to turn in echoing and locate the specific SQL query 
  which causes the issue.   Queries can take excessive time for a very wide 
  variety of reasons.

  On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:

  Hello --

  I'm working with a database populated with data originally from a file 
  structure of files that are ~150MB each. We are dealing with a lot of 
  data that is being stored in the database using the 'ARRAY' datatype, 
  specifically numeric[]. After loading some of the data into the database 
  I tried performing a query to get back some data, and comparing it with 
  code that reads directly from the file system - but the database query 
  took ~50 times longer to complete. For instance, to retrieve 100 records 
  that contain a few 4000 element arrays each using the code that reads the 
  filesystem it took less than a second, but the query on the database took 
  around 25 seconds to complete. Has anyone else had issues with array 
  types slowing down queries or does this sound more like another issue?

  Thanks!
  Adrian

  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalch...@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/sqlalchemy?hl=en.

  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalch...@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

2010-07-19 Thread Adrian Price-Whelan
Here is some more detailed information trying the query multiple ways:

Piping the command into psql and writing to a tmp file takes 12 seconds (tmp 
file is 241MB):

time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, 
spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
'spPlate-3586-55181.fits'; | psql spectradb -U postgres  tmp

real0m12.052s
user0m2.501s
sys 0m0.274s

engine.execute on the same query took ~6 seconds:
spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS 
spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
'spPlate-3586-55181.fits';)

spectra = 
session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == 
spPlate-3665-55247.fits).all() clocked in at 489 seconds

Thanks, 
Adrian

On Jul 19, 2010, at 12:24 PM, David Gardner wrote:

 Try running that query directly against the database see how long that takes.
 Also try running explain on that query make sure it is using your indexes 
 properly.
 Since you are only using a single filter make sure that the 
 spectrum_header.filename
 has an index, and make sure your foreign key column 
 spectrum.spectrum_header_pk
 is indexed as well.
 
 On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote:
 does it take a few seconds to fully fetch all the results and it only gets 
 1000 rows ?  or is that just to get the initial result?
 
 I'm not sure what you mean by this - the query does return 1000 rows.
 
   
 also if any of the individual columns are very large BLOBs or perhaps very 
 large PG arrays that would add to the overhead of a fetch.
 
 There definitely are columns of PG arrays ~4000 elements each, so back
 to my first email it seems like the culprit here could be the ARRAY's
 
 Thanks for your help,
 
 Adrian
 
 On Jul 19, 10:10 am, Michael Bayermike...@zzzcomputing.com  wrote:
   
 On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:
 
 
 
 
 
 
 Hi,
   
 
 I was just wondering at first whether there was a known issue with
 ARRAY types, but here is the query:
 spectra =
 session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
 == spPlateFilename).all()
   
 
 It should return ~1000 objects equaling about 120MB of data. In
 Python, this query takes10 minutes to complete,  but as a SQL query
 (copying and pasting the echo'd command) it takes a few seconds:
 SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
 spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
 spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
 spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
 spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
 spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
 spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
 WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'
   
 
 autoflush and autocommit are both set to False.
   
 
 It seems like a straightforward query so I'm confused as to what could
 be getting hung up.
   
 does it take a few seconds to fully fetch all the results and it only gets 
 1000 rows ?  or is that just to get the initial result? these are different 
 things.   also if any of the individual columns are very large BLOBs or 
 perhaps very large PG arrays that would add to the overhead of a fetch.   
 You can also try writing a DBAPI-only script that runs the query, as well 
 as running engine.execute(myquery.statement) and fetching rows that way to 
 see if some in-object process is the factor (which is unlikely).
 
 
 
 
 
 
 Thanks for any insight,
   
 
 Adrian
   
 
 On Jul 16, 10:24 pm, Michael Bayermike...@zzzcomputing.com  wrote:
   
 You absolutely need to turn in echoing and locate the specific SQL query 
 which causes the issue.   Queries can take excessive time for a very wide 
 variety of reasons.
 
 
 On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:
 
 
 Hello --
   
 
 I'm working with a database populated with data originally from a file 
 structure of files that are ~150MB each. We are dealing with a lot of 
 data that is being stored in the database using the 'ARRAY' datatype

[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

2010-07-19 Thread Adrian Price-Whelan
Also, as a follow-up - inserting these ARRAYs into the database is
very slow as well, slower than expected that is. Granted, it is
looping over 1000 objects and inserting a few 4000 element arrays for
each object, but doing one big SQL query takes considerably less time
than session.add(object), session.commit().

I thought that might be another clue, thanks!

On Jul 19, 1:53 pm, Adrian Price-Whelan adrian@gmail.com wrote:
 Here is some more detailed information trying the query multiple ways:

 Piping the command into psql and writing to a tmp file takes 12 seconds (tmp 
 file is 241MB):

 time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, 
 spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
 spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
 spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
 spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
 spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
 spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
 'spPlate-3586-55181.fits'; | psql spectradb -U postgres  tmp

 real    0m12.052s
 user    0m2.501s
 sys     0m0.274s

 engine.execute on the same query took ~6 seconds:
 spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS 
 spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS 
 spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
 spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
 spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
 spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
 spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
 'spPlate-3586-55181.fits';)

 spectra = 
 session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename 
 == spPlate-3665-55247.fits).all() clocked in at 489 seconds

 Thanks,
 Adrian

 On Jul 19, 2010, at 12:24 PM, David Gardner wrote:



  Try running that query directly against the database see how long that 
  takes.
  Also try running explain on that query make sure it is using your indexes 
  properly.
  Since you are only using a single filter make sure that the 
  spectrum_header.filename
  has an index, and make sure your foreign key column 
  spectrum.spectrum_header_pk
  is indexed as well.

  On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote:
  does it take a few seconds to fully fetch all the results and it only 
  gets 1000 rows ?  or is that just to get the initial result?

  I'm not sure what you mean by this - the query does return 1000 rows.

  also if any of the individual columns are very large BLOBs or perhaps 
  very large PG arrays that would add to the overhead of a fetch.

  There definitely are columns of PG arrays ~4000 elements each, so back
  to my first email it seems like the culprit here could be the ARRAY's

  Thanks for your help,

  Adrian

  On Jul 19, 10:10 am, Michael Bayermike...@zzzcomputing.com  wrote:

  On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:

  Hi,

  I was just wondering at first whether there was a known issue with
  ARRAY types, but here is the query:
  spectra =
  session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
  == spPlateFilename).all()

  It should return ~1000 objects equaling about 120MB of data. In
  Python, this query takes10 minutes to complete,  but as a SQL query
  (copying and pasting the echo'd command) it takes a few seconds:
  SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
  spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
  spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
  spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
  spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
  spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
  spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
  WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'

  autoflush and autocommit are both set to False.

  It seems like a straightforward query so I'm confused as to what could
  be getting hung up.

  does it take a few seconds to fully fetch all the results and it only 
  gets 1000 rows ?  or is that just to get the initial result? these are 
  different things.   also if any of the individual columns are very large 
  BLOBs or perhaps very large PG arrays that would add to the overhead of a 
  fetch.   You can also try writing a DBAPI-only script that runs the 
  query, as well as running engine.execute(myquery.statement) and fetching 
  rows that way to see if some in-object process is the factor (which is 
  unlikely).

  Thanks for any insight,

  Adrian

  On Jul 16, 10:24 pm, Michael Bayermike...@zzzcomputing.com  wrote:

  You absolutely need to turn in echoing and locate the specific SQL 
  query which causes the issue.   Queries can take excessive time

Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

2010-07-19 Thread Adrian Price-Whelan
First off, thanks for your quick replies!

I will look into this, but I can tell you that the arrays are strictly numbers 
and the array columns are type numeric[]

Thanks again,
Adrian

On Jul 19, 2010, at 3:47 PM, Michael Bayer wrote:

 
 On Jul 19, 2010, at 1:53 PM, Adrian Price-Whelan wrote:
 
 Here is some more detailed information trying the query multiple ways:
 
 Piping the command into psql and writing to a tmp file takes 12 seconds (tmp 
 file is 241MB):
 
 time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, 
 spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
 spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
 spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
 spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask 
 AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk 
 = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
 'spPlate-3586-55181.fits'; | psql spectradb -U postgres  tmp
 
 real 0m12.052s
 user 0m2.501s
 sys  0m0.274s
 
 engine.execute on the same query took ~6 seconds:
 spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS 
 spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS 
 spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
 spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
 spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask 
 AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk 
 = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
 'spPlate-3586-55181.fits';)
 
 Call fetchall() on the result to get a better idea what's going on.Here's 
 the source of the ARRAY type:
 
def result_processor(self, dialect, coltype):
item_proc = self.item_type.result_processor(dialect, coltype)
if item_proc:
def convert_item(item):
if isinstance(item, list):
return [convert_item(child) for child in item]
else:
return item_proc(item)
else:
def convert_item(item):
if isinstance(item, list):
return [convert_item(child) for child in item]
else:
return item
def process(value):
if value is None:
return value
return [convert_item(item) for item in value]
return process
 
 as you can see, if your ARRAY is of a Unicode type or similar, convert() must 
 be called on each item (only happens during fetch).  This is a likely cause 
 of the slowdown and you should consider what kind of converters you're using 
 on your ARRAY members.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalch...@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.
 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Unknown Issue Causing Extremely Slow Query

2010-07-16 Thread Adrian Price-Whelan
Hello --

I'm working with a database populated with data originally from a file 
structure of files that are ~150MB each. We are dealing with a lot of data that 
is being stored in the database using the 'ARRAY' datatype, specifically 
numeric[]. After loading some of the data into the database I tried performing 
a query to get back some data, and comparing it with code that reads directly 
from the file system - but the database query took ~50 times longer to 
complete. For instance, to retrieve 100 records that contain a few 4000 element 
arrays each using the code that reads the filesystem it took less than a 
second, but the query on the database took around 25 seconds to complete. Has 
anyone else had issues with array types slowing down queries or does this sound 
more like another issue?

Thanks!
Adrian

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.