[sqlalchemy] Re: executemany + postgresql

2009-11-07 Thread Jon Nelson

On Fri, Nov 6, 2009 at 9:57 AM, Michael Bayer mike...@zzzcomputing.com wrote:
 Before I even posted I resorted to strace. strace immediately
 confirmed my suspicion - when using psycopg2 I don't see one big fat
 INSERT with lots of binds, I see one INSERT per bind, and it's this
 that is ultimately killing the performance. You can easily observe
 this via strace: as I'm sure you know, the communication between the
 test program and postgresql takes place across a socket (unix domain
 or tcp/ip). For every single set of bind params, the result is
 essentially one sendto (INSERT INTO ) and rt_sigprocmask, a poll,
 and then a recvfrom and rt_sigprocmask pair.  Profiling at the C level
 shows that sendto accounts for *35%* of the total runtime and recvfrom
 a healthy 15%. It's this enormous overhead for every single bind param
 that's killing the performance.

 have you asked about this on the psycopg2 mailing list ?   its at 
 http://mail.python.org/mailman/listinfo/python-list
  .   Let me know if you do, because I'll get out the popcorn... :)

That's the python list.
Anyway, I did some more testing. executemany performance is not any
better than looping over execute, because that's all that executemany
appears to do in any case.

However, I manually built a bit fat set of bind params (bypassing
sqlalchemy directly) and got a SUBSTANTIAL performance improvement.
Postgresql as of 8.2 supports /sets/ of bind params, it'd be nice if
pg8000 or psycopg2 (or both) supported that. Building 25000 bind
params by hand is not fun, but it got me to just shy of 50K
inserts/second.

 We also support the pg8000 DBAPI in 0.6.  I doubt its doing something
 differently here but feel free to connect with postgresql+pg8000://
 and see what you get.

I tried pg8000 but I got an error:

...

return self.dbapi.connect(*cargs, **cparams)
sqlalchemy.exc.DBAPIError: (TypeError) connect() takes at least 1
non-keyword argument (0 given) None None



-- 
Jon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: executemany + postgresql

2009-11-07 Thread Michael Bayer

On Nov 7, 2009, at 12:53 PM, Jon Nelson wrote:


 have you asked about this on the psycopg2 mailing list ?   its at 
 http://mail.python.org/mailman/listinfo/python-list
  .   Let me know if you do, because I'll get out the popcorn... :)

 That's the python list.

oops:

http://lists.initd.org/mailman/listinfo/psycopg



 I tried pg8000 but I got an error:

 ...

return self.dbapi.connect(*cargs, **cparams)
 sqlalchemy.exc.DBAPIError: (TypeError) connect() takes at least 1
 non-keyword argument (0 given) None None

i can't reproduce that.   this is with the latest trunk:

from sqlalchemy import *

e = create_engine('postgresql+pg8000://scott:ti...@localhost/test')

print e.execute(select 1).fetchall()

produces:

[(1,)]



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: executemany + postgresql

2009-11-07 Thread Jon Nelson

On Sat, Nov 7, 2009 at 11:58 AM, Michael Bayer mike...@zzzcomputing.com wrote:

 On Nov 7, 2009, at 12:53 PM, Jon Nelson wrote:

 have you asked about this on the psycopg2 mailing list ?   its at
 http://mail.python.org/mailman/listinfo/python-list

  .   Let me know if you do, because I'll get out the popcorn... :)

 That's the python list.

 oops:
 http://lists.initd.org/mailman/listinfo/psycopg


 I tried pg8000 but I got an error:

 ...

    return self.dbapi.connect(*cargs, **cparams)
 sqlalchemy.exc.DBAPIError: (TypeError) connect() takes at least 1
 non-keyword argument (0 given) None None

 i can't reproduce that.   this is with the latest trunk:
 from sqlalchemy import *
 e = create_engine('postgresql+pg8000://scott:ti...@localhost/test')
 print e.execute(select 1).fetchall()
 produces:
 [(1,)]

Apparently, pg8000 requires host, user and pass (or at least one of those).

Of course, then when I am connected, I get a traceback:

...
metadata.drop_all()
  File /usr/lib64/python2.6/site-packages/sqlalchemy/schema.py, line
1871, in drop_all
bind.drop(self, checkfirst=checkfirst, tables=tables)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 1336, in drop
self._run_visitor(ddl.SchemaDropper, entity,
connection=connection, **kwargs)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 1360, in _run_visitor
visitorcallable(self.dialect, conn, **kwargs).traverse(element)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/sql/visitors.py,
line 86, in traverse
return traverse(obj, self.__traverse_options__, self._visitor_dict)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/sql/visitors.py,
line 197, in traverse
return traverse_using(iterate(obj, opts), obj, visitors)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/sql/visitors.py,
line 191, in traverse_using
meth(target)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/ddl.py,
line 89, in visit_metadata
collection = [t for t in reversed(sql_util.sort_tables(tables)) if
self._can_drop(t)]
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/ddl.py,
line 104, in _can_drop
return not self.checkfirst or
self.dialect.has_table(self.connection, table.name,
schema=table.schema)
  File 
/usr/lib64/python2.6/site-packages/sqlalchemy/dialects/postgresql/base.py,
line 611, in has_table
type_=sqltypes.Unicode)]
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 991, in execute
return Connection.executors[c](self, object, multiparams, params)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 1053, in _execute_clauseelement
return self.__execute_context(context)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 1076, in __execute_context
self._cursor_execute(context.cursor, context.statement,
context.parameters[0], context=context)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py,
line 1136, in _cursor_execute
self.dialect.do_execute(cursor, statement, parameters, context=context)
  File /usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py,
line 207, in do_execute
cursor.execute(statement, parameters)
  File pg8000/dbapi.py, line 243, in _fn
return fn(self, *args, **kwargs)
  File pg8000/dbapi.py, line 312, in execute
self._execute(operation, args)
  File pg8000/dbapi.py, line 317, in _execute
self.cursor.execute(new_query, *new_args)
  File pg8000/interface.py, line 303, in execute
self._stmt = PreparedStatement(self.connection, query,
statement_name=, *[{type: type(x), value: x} for x in args])
  File pg8000/interface.py, line 108, in __init__
self._parse_row_desc = self.c.parse(self._statement_name, statement, types)
  File pg8000/protocol.py, line 918, in _fn
return fn(self, *args, **kwargs)
  File pg8000/protocol.py, line 1069, in parse
self._send(Parse(statement, qs, param_types))
  File pg8000/protocol.py, line 975, in _send
data = msg.serialize()
  File pg8000/protocol.py, line 121, in serialize
val = struct.pack(!i, len(val) + 4) + val
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8d in position
3: ordinal not in range(128)



-- 
Jon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: executemany + postgresql

2009-11-07 Thread Michael Bayer


On Nov 7, 2009, at 1:30 PM, Jon Nelson wrote:

  File pg8000/protocol.py, line 121, in serialize
val = struct.pack(!i, len(val) + 4) + val
 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8d in position
 3: ordinal not in range(128)

make sure you're on the latest tip of pg8000, which these days seems  
to be at http://github.com/mfenniak/pg8000/tree/trunk .  It also  
adheres to the client encoding of your PG database, which you should  
make sure is on utf-8.

But its not going to render an INSERT...VALUES with multiple  
parameters in one big string, so if that's your goal you need to  
generate that string yourself.I'm surprised that sqlite, per your  
observation, parses an INSERT statement and re-renders it with  
multiple VALUES clauses ?very surprising behavior.




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: executemany + postgresql

2009-11-07 Thread Jon Nelson

On Sat, Nov 7, 2009 at 3:02 PM, Michael Bayer mike...@zzzcomputing.com wrote:


 On Nov 7, 2009, at 1:30 PM, Jon Nelson wrote:

  File pg8000/protocol.py, line 121, in serialize
    val = struct.pack(!i, len(val) + 4) + val
 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8d in position
 3: ordinal not in range(128)

 make sure you're on the latest tip of pg8000, which these days seems
 to be at http://github.com/mfenniak/pg8000/tree/trunk .  It also
 adheres to the client encoding of your PG database, which you should
 make sure is on utf-8.

Ah. I was running the latest /released/ version - I generally avoid
running 'tip/HEAD/whatever' except during testing. Since I don't
expect pg8000 to have any substantially different behavior, it's
probably not even worth the effort.

snip/

 I'm surprised that sqlite, per your
 observation, parses an INSERT statement and re-renders it with
 multiple VALUES clauses ?    very surprising behavior.

I'm not sure I said that - I certainly didn't intend that.

Ultimately, the IPC costs associated with each set of bind params (one
per row) just murders psycopg2 when compared to sqlite. There isn't
any sqlite RPC per-se, since it's always local.  I can only assume
that sqlite defers locking the database until the start of a
transaction, and since sqlite isn't multi-writer aware the overhead of
doing so is minimal.

I wasn't comparing sqlite and postgresql per se - there isn't much of
a comparison in my mind once you start needing all of the features,
stability, and power that postgresql brings. However, I was
disappointed to see that psycopg2 is not making use of the (postgresql
8.2 and newer) multi-bind param INSERT stuff, as this ultimately
reduces the IPC overhead to a very small amount.

The cost of a single call to postgresql might be small, but when you
multiply it by hundreds of thousands or millions it suddenly becomes a
deciding factor in some situations.

-- 
Jon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: executemany + postgresql

2009-11-06 Thread Adrian von Bidder
Heyho!

On Friday 06 November 2009 02.46:11 Jon Nelson wrote:
 ... was performing an individual
 INSERT for every single row.

Don't know sqlalchemy good enough, but for big bulk imports on the SQL side, 
shouldn't COPY be used?  Which is as far as I know pg-specific / non-SQL 
standard.


cheers
-- vbi

-- 
Lo-lan-do モインさん?
nobse Lo-lan-do: Gesundheit.
-- #debian-devel


signature.asc
Description: This is a digitally signed message part.


[sqlalchemy] Re: executemany + postgresql

2009-11-05 Thread Jon

On Nov 5, 8:40 pm, Michael Bayer mike...@zzzcomputing.com wrote:
 On Nov 5, 2009, at 8:46 PM, Jon Nelson wrote:





  I recently ran into an issue today where batched (inside a
  transaction) I was able to achieve not more than about 9000
  inserts/second to a postgresql database (same machine running the
  test).

  With everything exactly the same, I was able to achieve over 50,000
  inserts/s to sqlite.

  Now, I won't argue the relative merits of each database, but this is a
  big problem for postgresql. I believe I have determined that the
  psycopg2 module is to blame, and the substantial portion of the time
  spent was being spent in IPC/RPC. Basically, every single insert in
  this test is identical except for the values (same table and columns),
  but psycopg2 (or possibly SQLAlchemy) was performing an individual
  INSERT for every single row. I was *not* using the ORM.

  The code was something like this:

  row_values = build_a_bunch_of_dictionaries()
  ins = table.insert()
  t = conn.begin()
  conn.execute(ins, row_values)
  t.commit()

  where row_values is (of course) a list of dictionaries.

  What can be done here to improve the speed of bulk inserts? For
  postgresql to get walloped by a factor of 5 in this area is a big
  bummer.

 it depends on the source of the speed problem.   if your table has
 types which do utf-8 encoding on each value, for example, that takes
 up a lot of time.  the sqlite backend doesn't have this requirement
 but the PG one in 0.5 currently does.

 we've done some work on this in 0.6 to reduce this - we now use
 psycopg2's UNICODE extension, so that we expect result rows to come
 back as unicode objects already.   In response to this question I just
 made the same change for bind parameters so that they wont be encoded
 into utf-8 on the way in, so feel free to try r6484 of trunk.

I gave that a try and did receive a mild speed boost - from ~9000
inserts/s to 9500 +/- 200.
However, 9500 is still substantially lower than 50,000.  In this case
(pathological), *all* of the values are strings, and in fact the table
doesn't even have a primary key.

 Also psycopg2 is a very fast, native DBAPI so I doubt there's any
 bottleneck there.

Granted, I'm using SA on /top/ of sqlite3 and psycopg2 (2.0.12), but
when the only thing that changes is the dburi...

Before I even posted I resorted to strace. strace immediately
confirmed my suspicion - when using psycopg2 I don't see one big fat
INSERT with lots of binds, I see one INSERT per bind, and it's this
that is ultimately killing the performance. You can easily observe
this via strace: as I'm sure you know, the communication between the
test program and postgresql takes place across a socket (unix domain
or tcp/ip). For every single set of bind params, the result is
essentially one sendto (INSERT INTO ) and rt_sigprocmask, a poll,
and then a recvfrom and rt_sigprocmask pair.  Profiling at the C level
shows that sendto accounts for *35%* of the total runtime and recvfrom
a healthy 15%. It's this enormous overhead for every single bind param
that's killing the performance.

--
Jon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---