[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes 10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype, specifically numeric[]. After loading some of the data into the database I tried performing a query to get back some data, and comparing it with code that reads directly from the file system - but the database query took ~50 times longer to complete. For instance, to retrieve 100 records that contain a few 4000 element arrays each using the code that reads the filesystem it took less than a second, but the query on the database took around 25 seconds to complete. Has anyone else had issues with array types slowing down queries or does this sound more like another issue? Thanks! Adrian -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes 10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype, specifically numeric[]. After loading some of the data into the database I tried performing a query to get back some data, and comparing it with code that reads directly from the file system - but the database query took ~50 times longer to complete. For instance, to retrieve 100 records that contain a few 4000 element arrays each using the code that reads the filesystem it took less than a second, but the query on the database took around 25 seconds to complete. Has anyone else had issues with array types slowing down queries or does this sound more like another issue? Thanks! Adrian -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? I'm not sure what you mean by this - the query does return 1000 rows. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. There definitely are columns of PG arrays ~4000 elements each, so back to my first email it seems like the culprit here could be the ARRAY's Thanks for your help, Adrian On Jul 19, 10:10 am, Michael Bayer mike...@zzzcomputing.com wrote: On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes 10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype, specifically numeric[]. After loading some of the data into the database I tried performing a query to get back some data, and comparing it with code that reads directly from the file system - but the database query took ~50 times longer to complete. For instance, to retrieve 100 records that contain a few 4000 element arrays each using the code that reads the filesystem it took less than a second, but the query on the database took around 25 seconds to complete. Has anyone else had issues with array types slowing down queries or does this sound more like another issue? Thanks! Adrian -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 11:46 AM, Adrian Price-Whelan wrote: does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? I'm not sure what you mean by this - the query does return 1000 rows. if you run a query from a SQL console tool, and the query is to return hundreds of thousands or millions of results, you'll begin receiving rows immediately, and the SQL tool will display these rows within a fraction of a second. However, it could then take 20 minutes to scroll through the entire set of results.When using psycopg2, result rows are buffered fully. You won't see any result until all result rows as well as all data in each column is fully fetched into memory. This causes the appearance of a vast difference in speed between a SQL console tool and a Python script. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. There definitely are columns of PG arrays ~4000 elements each, so back to my first email it seems like the culprit here could be the ARRAY's Thanks for your help, Adrian On Jul 19, 10:10 am, Michael Bayer mike...@zzzcomputing.com wrote: On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes 10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayer mike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype, specifically numeric[]. After loading some of the data into the database I tried performing a query to get back some data, and comparing it with code that reads directly from the file system - but the database query took ~50 times longer to complete. For instance, to retrieve 100 records that contain a few 4000 element arrays each using the code that reads the filesystem it took less than a second, but the query on the database took around 25 seconds to complete. Has anyone else had issues with array types slowing down queries or does this sound more like another issue? Thanks! Adrian -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Try running that query directly against the database see how long that takes. Also try running explain on that query make sure it is using your indexes properly. Since you are only using a single filter make sure that the spectrum_header.filename has an index, and make sure your foreign key column spectrum.spectrum_header_pk is indexed as well. On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote: does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? I'm not sure what you mean by this - the query does return 1000 rows. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. There definitely are columns of PG arrays ~4000 elements each, so back to my first email it seems like the culprit here could be the ARRAY's Thanks for your help, Adrian On Jul 19, 10:10 am, Michael Bayermike...@zzzcomputing.com wrote: On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayermike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype, specifically numeric[]. After loading some of the data into the database I tried performing a query to get back some data, and comparing it with code that reads directly from the file system - but the database query took ~50 times longer to complete. For instance, to retrieve 100 records that contain a few 4000 element arrays each using the code that reads the filesystem it took less than a second, but the query on the database took around 25 seconds to complete. Has anyone else had issues with array types slowing down queries or does this sound more like another issue? Thanks! Adrian -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- David Gardner Pipeline Tools Programmer Jim Henson Creature Shop dgard...@creatureshop.com -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real0m12.052s user0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlate-3665-55247.fits).all() clocked in at 489 seconds Thanks, Adrian On Jul 19, 2010, at 12:24 PM, David Gardner wrote: Try running that query directly against the database see how long that takes. Also try running explain on that query make sure it is using your indexes properly. Since you are only using a single filter make sure that the spectrum_header.filename has an index, and make sure your foreign key column spectrum.spectrum_header_pk is indexed as well. On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote: does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? I'm not sure what you mean by this - the query does return 1000 rows. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. There definitely are columns of PG arrays ~4000 elements each, so back to my first email it seems like the culprit here could be the ARRAY's Thanks for your help, Adrian On Jul 19, 10:10 am, Michael Bayermike...@zzzcomputing.com wrote: On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayermike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a very wide variety of reasons. On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: Hello -- I'm working with a database populated with data originally from a file structure of files that are ~150MB each. We are dealing with a lot of data that is being stored in the database using the 'ARRAY' datatype,
[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Also, as a follow-up - inserting these ARRAYs into the database is very slow as well, slower than expected that is. Granted, it is looping over 1000 objects and inserting a few 4000 element arrays for each object, but doing one big SQL query takes considerably less time than session.add(object), session.commit(). I thought that might be another clue, thanks! On Jul 19, 1:53 pm, Adrian Price-Whelan adrian@gmail.com wrote: Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real 0m12.052s user 0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlate-3665-55247.fits).all() clocked in at 489 seconds Thanks, Adrian On Jul 19, 2010, at 12:24 PM, David Gardner wrote: Try running that query directly against the database see how long that takes. Also try running explain on that query make sure it is using your indexes properly. Since you are only using a single filter make sure that the spectrum_header.filename has an index, and make sure your foreign key column spectrum.spectrum_header_pk is indexed as well. On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote: does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? I'm not sure what you mean by this - the query does return 1000 rows. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. There definitely are columns of PG arrays ~4000 elements each, so back to my first email it seems like the culprit here could be the ARRAY's Thanks for your help, Adrian On Jul 19, 10:10 am, Michael Bayermike...@zzzcomputing.com wrote: On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: Hi, I was just wondering at first whether there was a known issue with ARRAY types, but here is the query: spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == spPlateFilename).all() It should return ~1000 objects equaling about 120MB of data. In Python, this query takes10 minutes to complete, but as a SQL query (copying and pasting the echo'd command) it takes a few seconds: SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' autoflush and autocommit are both set to False. It seems like a straightforward query so I'm confused as to what could be getting hung up. does it take a few seconds to fully fetch all the results and it only gets 1000 rows ? or is that just to get the initial result? these are different things. also if any of the individual columns are very large BLOBs or perhaps very large PG arrays that would add to the overhead of a fetch. You can also try writing a DBAPI-only script that runs the query, as well as running engine.execute(myquery.statement) and fetching rows that way to see if some in-object process is the factor (which is unlikely). Thanks for any insight, Adrian On Jul 16, 10:24 pm, Michael Bayermike...@zzzcomputing.com wrote: You absolutely need to turn in echoing and locate the specific SQL query which causes the issue. Queries can take excessive time for a
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 1:53 PM, Adrian Price-Whelan wrote: Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real 0m12.052s user 0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) Call fetchall() on the result to get a better idea what's going on.Here's the source of the ARRAY type: def result_processor(self, dialect, coltype): item_proc = self.item_type.result_processor(dialect, coltype) if item_proc: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item_proc(item) else: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item def process(value): if value is None: return value return [convert_item(item) for item in value] return process as you can see, if your ARRAY is of a Unicode type or similar, convert() must be called on each item (only happens during fetch). This is a likely cause of the slowdown and you should consider what kind of converters you're using on your ARRAY members. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
First off, thanks for your quick replies! I will look into this, but I can tell you that the arrays are strictly numbers and the array columns are type numeric[] Thanks again, Adrian On Jul 19, 2010, at 3:47 PM, Michael Bayer wrote: On Jul 19, 2010, at 1:53 PM, Adrian Price-Whelan wrote: Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real 0m12.052s user 0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) Call fetchall() on the result to get a better idea what's going on.Here's the source of the ARRAY type: def result_processor(self, dialect, coltype): item_proc = self.item_type.result_processor(dialect, coltype) if item_proc: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item_proc(item) else: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item def process(value): if value is None: return value return [convert_item(item) for item in value] return process as you can see, if your ARRAY is of a Unicode type or similar, convert() must be called on each item (only happens during fetch). This is a likely cause of the slowdown and you should consider what kind of converters you're using on your ARRAY members. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 3:55 PM, Adrian Price-Whelan wrote: First off, thanks for your quick replies! I will look into this, but I can tell you that the arrays are strictly numbers and the array columns are type numeric[] so its going to be doing that somewhat inefficient isinstance(list) thing you see below, this appears to be how it handles arrays of arbitrary numbers of dimensions. This could be optimized if the ARRAY type accepted some clues as to how many dimensions are present. Seems a little suspicious that the slowdown would be 8000% though. Thanks again, Adrian On Jul 19, 2010, at 3:47 PM, Michael Bayer wrote: On Jul 19, 2010, at 1:53 PM, Adrian Price-Whelan wrote: Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real0m12.052s user0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) Call fetchall() on the result to get a better idea what's going on. Here's the source of the ARRAY type: def result_processor(self, dialect, coltype): item_proc = self.item_type.result_processor(dialect, coltype) if item_proc: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item_proc(item) else: def convert_item(item): if isinstance(item, list): return [convert_item(child) for child in item] else: return item def process(value): if value is None: return value return [convert_item(item) for item in value] return process as you can see, if your ARRAY is of a Unicode type or similar, convert() must be called on each item (only happens during fetch). This is a likely cause of the slowdown and you should consider what kind of converters you're using on your ARRAY members. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 4:08 PM, Michael Bayer wrote: so its going to be doing that somewhat inefficient isinstance(list) thing you see below, this appears to be how it handles arrays of arbitrary numbers of dimensions. This could be optimized if the ARRAY type accepted some clues as to how many dimensions are present. There is a function in postgresql called array_dims that returns the array dimensions - could this be of some use? It could query the database dynamically to get the number of values. Seems a little suspicious that the slowdown would be 8000% though. We have four columns in that table of type numeric[]. So that query returns 1000 rows x 4 numeric[] columns of 4634 values in each array. (I work with Adrian.) Cheers, Demitri -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 4:38 PM, thatsanicehatyouh...@mac.com wrote: On Jul 19, 2010, at 4:08 PM, Michael Bayer wrote: so its going to be doing that somewhat inefficient isinstance(list) thing you see below, this appears to be how it handles arrays of arbitrary numbers of dimensions. This could be optimized if the ARRAY type accepted some clues as to how many dimensions are present. There is a function in postgresql called array_dims that returns the array dimensions - could this be of some use? It could query the database dynamically to get the number of values. I'd just have you say ARRAY(dims=4) Seems a little suspicious that the slowdown would be 8000% though. We have four columns in that table of type numeric[]. So that query returns 1000 rows x 4 numeric[] columns of 4634 values in each array. (I work with Adrian.) so try this type: class ARRAY(postgresql.ARRAY): def bind_processor(self, dialect): return None def result_processor(self, dialect, coltype): return None that will just give you the raw psycopg2 data. Cheers, Demitri -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Hi Michael, Assuming I understood you correctly, I tried the code below. The result was the same (the query took 486 seconds). Since I autoload everything, I first adjust the column types to the class you defined. Did I misunderstand something? Thanks again for your help. Cheers, Demitri --- from sqlalchemy.dialects import postgresql from sqlalchemy.schema import Column from sqlalchemy.types import Numeric ... class ARRAY(postgresql.ARRAY): def bind_processor(self, dialect): return None def result_processor(self, dialect, coltype): return None for c in Spectrum.__table__.get_children(): if type(c) is Column: if c.name in [values, inv_var, and_mask, or_mask]: # numeric[] columns c.type = ARRAY(Numeric) print c.name spPlateFilename = spPlate-3586-55181.fits spectra = session.query(Spectrum).join(SpectrumHeader).\ filter(SpectrumHeader.filename == spPlateFilename).all() -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 5:46 PM, thatsanicehatyouh...@mac.com wrote: Hi Michael, Assuming I understood you correctly, I tried the code below. The result was the same (the query took 486 seconds). Since I autoload everything, I first adjust the column types to the class you defined. Did I misunderstand something? nope. you'd have to produce some profiling results for us or a script that illustrates the issue. Thanks again for your help. Cheers, Demitri --- from sqlalchemy.dialects import postgresql from sqlalchemy.schema import Column from sqlalchemy.types import Numeric ... class ARRAY(postgresql.ARRAY): def bind_processor(self, dialect): return None def result_processor(self, dialect, coltype): return None for c in Spectrum.__table__.get_children(): if type(c) is Column: if c.name in [values, inv_var, and_mask, or_mask]: # numeric[] columns c.type = ARRAY(Numeric) print c.name spPlateFilename = spPlate-3586-55181.fits spectra = session.query(Spectrum).join(SpectrumHeader).\ filter(SpectrumHeader.filename == spPlateFilename).all() -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 5:46 PM, thatsanicehatyouh...@mac.com wrote: Hi Michael, Assuming I understood you correctly, I tried the code below. The result was the same (the query took 486 seconds). Since I autoload everything, I first adjust the column types to the class you defined. Did I misunderstand something? here try this type: class ARRAY(postgresql.ARRAY): def bind_processor(self, dialect): return None def result_processor(self, dialect, coltype): return None def adapt(self, impltype): return self -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On 07/19/2010 02:47 PM, Michael Bayer wrote: On Jul 19, 2010, at 1:53 PM, Adrian Price-Whelan wrote Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB): time echo SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'; | psql spectradb -U postgres tmp real 0m12.052s user 0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute(SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';) Call fetchall() on the result to get a better idea what's going on. [...] Assuming fetchall() brings the execution time back to ~486 seconds, it is likely your bottleneck is psycopg2's str-to-Decimal conversion. A big part of this is that Python's Decimal class is written in pure Python. You can override psycopg2's type conversion[1], but the only easy speed increase I see is to leave the values as strings. Parsing them as floats may help as well. -Conor [1] http://initd.org/psycopg/docs/faq.html#problems-with-type-conversions -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
Hi, Pasted below is a profile of the earlier code posted. I did update it with your new definition of ARRAY Michael, but that only shaved off 18 seconds (down to 468s total) when run without the profiler. The large number of __new__ calls roughly tracks with the number of numeric values coming back from the database - 1000 rows x 4 columns x 4634 per array, or 18,536,000 new objects. It looks like Conor's suggestion that the bottleneck is the str-to-Decimal conversion is correct. (This would explain why our initial imports were taking an extremely long time as well...). I intend to use lazy loading on these columns, but before that I'm certainly open to any suggestions to cut the time down significantly! We'll also look into psycopg2's Decimal class. If the type in the database is numeric[], how can I retrieve it as a string? Thanks again for the help! Cheers, Demitri Mon Jul 19 20:29:16 2010prof_stats 194856576 function calls (194856471 primitive calls) in 538.608 CPU seconds Ordered by: cumulative time List reduced from 460 to 20 due to restriction 20 ncalls tottime percall cumtime percall filename:lineno(function) 11.9681.968 538.608 538.608 string:1(module) 10.0000.000 536.430 536.430 ./examples/db_test.py:56(main) 10.0360.036 536.414 536.414 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1447(all) 10010.0010.000 530.8940.530 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1619(instances) 10.0000.000 529.213 529.213 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:2379(fetchall) 10.0000.000 529.211 529.211 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:2360(_fetchall_impl) 1 31.366 31.366 529.211 529.211 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects} 18538000 124.7840.000 497.8450.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py:511(__new__) 18551033 290.3530.000 290.3530.000 {built-in method __new__ of type object at 0xf6360} 18538021 45.5680.000 45.5680.000 {built-in method match} 74152000 17.2190.000 17.2190.000 {built-in method group} 185386309.9080.0009.9080.000 {isinstance} 185380006.3560.0006.3560.000 {method 'lstrip' of 'str' objects} 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1560(__iter__) 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1567(_execute_and_instances) 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/session.py:703(execute) 10.0000.0005.4835.483 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1148(execute) 10.0000.0005.4835.483 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1212(_execute_clauseelement) 10.0000.0005.4825.482 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1255(__execute_context) 10.0000.0005.4825.482 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1351(_cursor_execute) Time spent within each function Mon Jul 19 20:29:16 2010prof_stats 194856576 function calls (194856471 primitive calls) in 538.608 CPU seconds Ordered by: internal time List reduced from 460 to 20 due to restriction 20 ncalls tottime percall cumtime percall filename:lineno(function) 18551033 290.3530.000 290.3530.000 {built-in method __new__ of type object at 0xf6360} 18538000 124.7840.000 497.8450.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py:511(__new__) 18538021 45.5680.000 45.5680.000 {built-in method match} 1 31.366 31.366 529.211 529.211 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects} 74152000 17.2190.000 17.2190.000 {built-in method group} 185386309.9080.0009.9080.000 {isinstance} 185380006.3560.0006.3560.000 {method 'lstrip' of 'str' objects} 15.4825.4825.4825.482 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 185380162.8520.0002.8520.000 {method 'strip' of 'str' objects} 11.9681.968 538.608 538.608 string:1(module) 92701451.4870.0001.4870.000 {len} 40000.6340.0000.6340.000
Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query
On Jul 19, 2010, at 8:33 PM, thatsanicehatyouh...@mac.com wrote: Hi, Pasted below is a profile of the earlier code posted. I did update it with your new definition of ARRAY Michael, but that only shaved off 18 seconds (down to 468s total) when run without the profiler. The large number of __new__ calls roughly tracks with the number of numeric values coming back from the database - 1000 rows x 4 columns x 4634 per array, or 18,536,000 new objects. It looks like Conor's suggestion that the bottleneck is the str-to-Decimal conversion is correct. (This would explain why our initial imports were taking an extremely long time as well...). I intend to use lazy loading on these columns, but before that I'm certainly open to any suggestions to cut the time down significantly! We'll also look into psycopg2's Decimal class. If the type in the database is numeric[], how can I retrieve it as a string? psycopg2 returns the Decimal (its a Python built in) natively. You register a new type adapter as they mention here: http://initd.org/psycopg/docs/faq.html#problems-with-type-conversions however I don't know if their float() recipe is calling float() on a string or a Decimal. I'm pretty sure if you used the FLOAT type instead of NUMERIC (i.e. change your table in the DB, or use CAST) it returns floats directly without the decimal thing. Thanks again for the help! Cheers, Demitri Mon Jul 19 20:29:16 2010prof_stats 194856576 function calls (194856471 primitive calls) in 538.608 CPU seconds Ordered by: cumulative time List reduced from 460 to 20 due to restriction 20 ncalls tottime percall cumtime percall filename:lineno(function) 11.9681.968 538.608 538.608 string:1(module) 10.0000.000 536.430 536.430 ./examples/db_test.py:56(main) 10.0360.036 536.414 536.414 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1447(all) 10010.0010.000 530.8940.530 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1619(instances) 10.0000.000 529.213 529.213 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:2379(fetchall) 10.0000.000 529.211 529.211 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:2360(_fetchall_impl) 1 31.366 31.366 529.211 529.211 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects} 18538000 124.7840.000 497.8450.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py:511(__new__) 18551033 290.3530.000 290.3530.000 {built-in method __new__ of type object at 0xf6360} 18538021 45.5680.000 45.5680.000 {built-in method match} 74152000 17.2190.000 17.2190.000 {built-in method group} 185386309.9080.0009.9080.000 {isinstance} 185380006.3560.0006.3560.000 {method 'lstrip' of 'str' objects} 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1560(__iter__) 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/query.py:1567(_execute_and_instances) 10.0000.0005.4845.484 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/orm/session.py:703(execute) 10.0000.0005.4835.483 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1148(execute) 10.0000.0005.4835.483 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1212(_execute_clauseelement) 10.0000.0005.4825.482 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1255(__execute_context) 10.0000.0005.4825.482 /Library/Python/2.6/site-packages/SQLAlchemy-0.6.3-py2.6.egg/sqlalchemy/engine/base.py:1351(_cursor_execute) Time spent within each function Mon Jul 19 20:29:16 2010prof_stats 194856576 function calls (194856471 primitive calls) in 538.608 CPU seconds Ordered by: internal time List reduced from 460 to 20 due to restriction 20 ncalls tottime percall cumtime percall filename:lineno(function) 18551033 290.3530.000 290.3530.000 {built-in method __new__ of type object at 0xf6360} 18538000 124.7840.000 497.8450.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py:511(__new__) 18538021 45.5680.000 45.5680.000 {built-in method match} 1 31.366 31.366 529.211 529.211 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects} 74152000 17.2190.000 17.2190.000 {built-in