On Thu, Feb 11, 2016 at 4:45 PM, Frank Millman <fr...@chagford.com> wrote: > I have come up with a plan that seems to provide a solution. > > Instead of 'putting' one row at a time, let the database handler build up a > block of rows, and then 'put' the block. > > I tried a block of 10, and it ran a lot faster. I increased it to 50, and it > ran faster again. I tried 100 and there was not much improvement, so 50 > seems like an optimum number. The speed is now only slightly slower than > run_in_executor(), and it is more truly asynchronous.
Something worth checking would be real-world database performance metrics: what's time-to-first-row versus time-to-subsequent-rows? When you submit a query, the server first has to parse it and check for errors, then do all its optimization and stuff, and figure out an access plan. Then it goes and fetches stuff. If your query is a simple "select * from tablename" on a huge table, then it's entirely possible that you save a lot of time by fetching subsequent rows asynchronously; but if there's an ORDER BY that can't be resolved from an index (maybe involving a table join or a non-optimizable function call), the database might have to read everything from the disk before it can return a single row, so the time from first row to last row is a tiny fraction of the time to first row. What are your actual real-world queries like? Most likely, the database is reading rows in pages, and you'll have no way of predicting how many usable result rows are on any page. I wonder, is there any way you can actually operate this the other way around? Presumably the database is connected to you via a socket of some sort (TCP if you're going over a network, maybe a Unix socket for local connections), which you could select() on same as any other; if you can react to the arrival of more rows, that might be the easiest solution. That would probably require a dedicated async database library, though. ChrisA -- https://mail.python.org/mailman/listinfo/python-list