On 10/12/2010 11:35 AM, Jon Clements wrote:
On 12 Oct, 18:53, Jon Clements<jon...@googlemail.com>  wrote:
On 12 Oct, 18:32, Roy Smith<r...@panix.com>  wrote:



On Oct 12, 1:20 pm, Jon Clements<jon...@googlemail.com>  wrote:

On 12 Oct, 16:10, Roy Smith<r...@panix.com>  wrote:

PEP 249 says about executemany():

         Prepare a database operation (query or command) and then
         execute it against all parameter sequences or mappings
         found in the sequence seq_of_parameters.

are there any plans to update the api to allow an iterable instead of
a sequence?

I'm not understanding (probably me). Example?

I have a dictionary, d, which has a million items in it, and want to
do something like:

     executemany("update foo set bar = %s where id = %s",
d.iteritems())

If executemany accepted an iterable, that would work.  But, it only
accepts a sequence, so I need to do:

     executemany("update foo set bar = %s where id = %s", d.items())

which generates a million-item temporary list.  Or am I mis-
understanding the PEP?

Interesting, but here's my guess...

Replace d.items() with itertools.repeat( ('a', 'b') )

So, if you have a sequence, which has a length and known size, at
least you can have an attempt at the DB operations: whether the
transaction fails or not is another thing...In short, a sequence is
finite, while an iterable may be infinite.

That's just my guess and makes sense to me at least!

Jon.

Actually, thinking about it some more, I would take the following
approach:
(this is only loosely do with the Python DB API mind you...)

1) Start a transaction
2) Create a temporary table
3) Bulk insert your million update records to the temp table (from my
understanding of the PEP, executemany(), is allowed to repeatedly call
execute() unless it can do something "cleverer")
4) Execute an update with a from statement joining your main table and
temp table (pretty sure that's ANSI standard, and DB's should support
it -- embedded one's may not though, but if you're dealing with 1mil
records, I'm taking a guess you're not dealing with embedded)
5) End the transaction

Far more efficient as a repeated execute of 'update' will only just
hit the DB once, while an update statement with a from should allow
the DB a chance to optimise it.

Hope that makes sense, lemme know.

    Right.  In general, if you find yourself making millions of
SQL database requests in a loop, you're doing it wrong.

    Big database loads are usually done by creating a text file
with the desired data, then using a LOAD DATA INFILE command.
This (in MySQL) is tens to hundreds of times faster than
doing individual INSERT or UPDATE commands.

                                John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to