[sqlalchemy] Re: setting table and mapper arguments with Declarative

2008-09-23 Thread GHZ

On Sep 18, 2:59 pm, Michael Bayer [EMAIL PROTECTED] wrote:
 Well, I would think __table_args__ is the only argument you'd really  
 want to propigate in that way, and this is an inconvenience I've also  
 had so perhaps we'll do something about it...I would propose a  
 default_table_args keyword arg to declarative_base().

Thanks,

I would appreciate the default_table_args parameter to
declarative_base()

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread az

this is general programming approach, not sql specific.

for a 7 mil objects... u have to try to do some vertical (wrong term 
probably) layer-splitting of the data. imagine the objects being 
rectangles on horizontal line, each containg same layers. now u walk 
the rectangles like for each in X: walk Y. u need to try to walk like 
for layer in Y: walk all X.
i.e. if u have Object A that maps to tables tA and tB, if u do million 
objects, that would be million of (insert into tA, insert into tB and 
maybe update of foreignkey). u have to try to reach the other way 
around, one lump insertmillion into tA, another lump insertmillion 
into tB, whatever. 
if it's still slow - or hard to knit the links - u could generate some 
DB-specific dump/replication-format and import from there.
i guess u can go for the the latter approach even now, esp. if your 
data does not change (or u pick the 99% constant part and import that 
one, then add the rest slowly and dynamicaly).

ciao
svilen
www.svilendobrev.com


On Tuesday 23 September 2008 06:20:06 CodeIsMightier wrote:
 Hi:


 I am working on an open source project to write a search engine /
 datamining framework of sorts for Wikipedia, and one of the first
 things I need to do is to parse the Wikipedia dumps into an SQL
 database. I am using the sqlalchemy to do this but it is very slow
 (at the current rate, 130 days!!!). I am sure that I am doing
 something wrong since I am new at this, and am wondering whether
 any sqlalchemy veterans can offer his/her insights.

 The code can be found here:
 http://wikiminer.googlecode.com/svn/trunk/wikipedia_miner.py

 The critical part of the code is this:

for link_label, link_dest_title, dest_frag in
 self.parse_links(self.text):
print 'LINK from:', repr(self.title), 'to',
 repr(link_dest_title + '#' + dest_frag), 'label', repr(link_label)
try:
link_dest =
 session.query(Article).filter_by(title=link_dest_title).one()
except sqlalchemy.orm.exc.NoResultFound:
link_dest = None
print link_dest
session.add(Link(self, link_label, link_dest,
 dest_frag))

 Basically what this does is that it parses the links in a page,
 looks it up in the DB to resolve the reference, and then insert a
 Link into the DB. The problem is that the articles table is over
 7 million rows and there are maybe 50 million links.

 I have tried using both SQLite and Postgres as the database.
 Postgres EXPLAIN ANALYZE claims that the above statements should
 take only around 25 ms! I think I am doing something wrong with
 sqlalchemy, maybe I am creating too many objects? Any help would be
 very appreciated.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Problems with join query, can't figure out what's wrong. Noob alert!

2008-09-23 Thread olavgg

I've recently started using SQLAlchemy and is a newb. I have good SQL
knowledge though. My first project with SQLAlchemy is a big project
where I integrate SQLAlchemy to a Plone application. So far it has
been working great for queries from a single table however queries
with joins in two or more tables is way more difficult.

I've tried to join two tables with two different queries without
success

The query:
objects = db.session.query(Sak).join(Prosjekt,
Sak.prosjektid).filter(Prosjekt.kundeid==1532).all()
returns this error:
InvalidRequestError: Mapper 'Mapper|Sak|sak' has no property 'class
'Products.TikkTakk.model.Prosjekt.Prosjekt''

The query:
objects = db.session.query(Sak).from_statement(SELECT s.saksnr,
s.tittel FROM sak s INNER JOIN prosjekt p ON s.prosjektid =
p.prosjektid WHERE p.kundeid = 1532).all()
returns this error:
NoSuchColumnError: Could not locate column in row for column
'sak.prosjektid'

I have some trouble figuring out what I'm doing wrong. Especially the
NoSuchColumnError is frustrating as the column prosjektid exists in
the table sak, it's not a typo either ;)

The mapping looks like this:
mappers['prosjekt'] = mapper(Prosjekt, tables['prosjekt'],
properties = {
'sak': relation(Sak)
})
mappers['sak'] = mapper(Sak, tables['sak'],
properties = {
'prosjekt': relation(Prosjekt),
})

And the models like this:
class Prosjekt(Entity):
 Prosjekt entity map


prosjektid = Field(mysql.MSInteger, primary_key=True)
p_prosjektid = ManyToOne('Prosjekt')
sak = OneToMany('Sak')

using_options(shortnames=True)

def __init__(self, data):
 Objects are initialized using form data


def __repr__(self):
return 'Prosjekt (%d)' % (self.prosjektid)

class Sak(Entity):
 Sak entity map


prosjektid = ManyToOne('Prosjekt', primary_key=True)
saksnr = Field(mysql.MSInteger, primary_key=True)

using_options(shortnames=True)

def __init__(self, data):
 Objects are initialized using form data


def __repr__(self):
return 'Sak (%d)' % (
self.prosjektid,
self.saksnr,
)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread Shawn Church
Just a couple thoughts that might help you out:


1) I would profile the code.  It seems to me that running a regular
expression on an entire
wikipedia article would be a VERY expensive operation.

2) Did the first pass succeed and how long did it take?

3) Taking a quick look at
http://wikiminer.googlecode.com/svn/trunk/wikipedia_miner.py,  it
seems to me that the second pass through the data would create a 2nd
set of article objects
that are never saved to the database.  Therefore,  the 'self' reference in:

session.add(Link(self, link_label, link_dest, dest_frag))

would refer to an object that is never saved.  I guess this would not
matter since the id
field is correct (since you set it explicitly) but it seems to me that
it might be better
(faster) if you just read through the articles table for pass two
instead of re-parsing the
xml, something similar to:

#delete previous data
redirects_table.drop(bind=engine)
redirects_table.create(bind=engine)
links_table.drop(bind=engine)
links_table.create(bind=engine)

   for article in session.query(Article):
  article.parse_text(session)


It's pretty late so I may of missed something.  Hope the above helps.

Shawn Church

I/S Consultant
shawn at SChurchComputers.com

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: cascades over many-to-many

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 7:58 AM, [EMAIL PROTECTED] wrote:


 hi
 i cant get how to configure cascades over many-to-many relations.
 in my case these are implicit m2m (via secondary-table), being treated
 as either versioned one2many, or versioned many2one.

 lets say there is versioned A having many versioned Bs.
 (' and  denote version number):
 A1' has B1 which has versions:
 A1'.bs = [B1', B2']
 (another timestamp)
 A1'.bs = [B1, B2']
 A2 has versions which keep same B2, one has also B3:
 A2'.bs = [B2']
 (another timestamp)
 A2.bs = [B2', B3']

 deleting A1' should delete B1' and B1
 deleting A2' should not delete B2'; only deleting all A2' and A2 and
 A1' should delete B2'.
 is this all,delete-orphan given to the A.relation(B) ?
 or only delete-orphan?
 how about backrefs (if any)


delete-orphan pretty much needs delete to work correctly.  but for the  
many-to-many use case, I'm not sure if SQLA does the right thing and  
cross references every relationship - these cascades were designed  
more for the o2m direction (I think Hibernate supports them only for  
o2m relations).I'd work up some simple test cases to see what it  
has to say in this regard (it would have to actively load in other  
objects to see if an object is truly an orphan, for example.  not  
sure if its going to try that).


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: cascades over many-to-many

2008-09-23 Thread az

On Tuesday 23 September 2008 16:06:07 Michael Bayer wrote:
 On Sep 23, 2008, at 7:58 AM, [EMAIL PROTECTED] wrote:
  hi
  i cant get how to configure cascades over many-to-many relations.
  in my case these are implicit m2m (via secondary-table), being
  treated as either versioned one2many, or versioned many2one.
 
  lets say there is versioned A having many versioned Bs.
  (' and  denote version number):
  A1' has B1 which has versions:
  A1'.bs = [B1', B2']
  (another timestamp)
  A1'.bs = [B1, B2']
  A2 has versions which keep same B2, one has also B3:
  A2'.bs = [B2']
  (another timestamp)
  A2.bs = [B2', B3']
 
  deleting A1' should delete B1' and B1
  deleting A2' should not delete B2'; only deleting all A2' and A2
  and A1' should delete B2'.
  is this all,delete-orphan given to the A.relation(B) ?
  or only delete-orphan?
  how about backrefs (if any)

 delete-orphan pretty much needs delete to work correctly.  but for
 the many-to-many use case, I'm not sure if SQLA does the right
 thing and cross references every relationship - these cascades
 were designed more for the o2m direction (I think Hibernate
 supports them only for o2m relations).I'd work up some simple
 test cases to see what it has to say in this regard (it would have
 to actively load in other objects to see if an object is truly an
 orphan, for example.  not sure if its going to try that).
so far i've used all and it seems to work for one single owner, but 
i havent really tried the multiple case. the whole thing seems to me 
like a splitted reference-counting mechanism, one side being weakref, 
other not.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: ordering_list performance

2008-09-23 Thread jean-philippe dutreve

My use case is a bit different : new_entries can be placed everywhere
into the existing SA list, not only at the end (actually it depends on
the entry date).

On 22 sep, 21:20, jason kirtland [EMAIL PROTECTED] wrote:
 Ah, looking more closely i see you're replacing self.entries with a
 list, not insorting into a SA list collection- that's totally ok.  It
 might squeeze a little more speed out to do:

 updated_entries = list(self.entries) + new_entries
 base = len(self.entries)
 for idx, entry in enumerate(new_entries):
 entry.position = base + idx
 self.entries = updated_entries

 orderinglist's extend method could be made to do something much like the
 above quite efficiently.

 jason kirtland wrote:
  A warning: that depends on a bug in the C version of bisect.  When given
  a list subclass, it mistakenly ignores the subclass method
  implementations.  The below will break, if and when that's fixed to
  match the pure Python implementation in the standard lib.

  Calling list.extend(account_entries, new_entries) is probably a safe
  alternative.

  *http://bugs.python.org/issue3935

  jean-philippe dutreve wrote:
  What I've done is something like this:

  account_entries = self.entries[:]
  for entry in new_entries:
  insort_right(account_entries, entry)
  for i, entry in enumerate(account_entries):
  entry.position = i
  self.entries = account_entries

  Don't know if it's the right way to do it but it's much faster.

  On 22 sep, 18:41, jason kirtland [EMAIL PROTECTED] wrote:
  I'm sure there is potential for improvement on the current orderinglist
  code- please feel free to send a patch with optimizations you've found
  to the SA trac.

  The orderinglist hasn't changed much since 0.3, but with 0.5 there may
  be entirely new implementations possible.  For example, I could imagine
  one that defers calculation and manipulation of the positioning
  information until a before_flush hook.  That may be perform better, with
  the trade-off that the position attribute can't be trusted to be in sync
  with the list order.

  jean-philippe dutreve wrote:
  Below is the profiling of code that added 1200 items into an
  ordering_list relation. I had to bypass the ordering_list stuff for
  bulk additions in order to have better performance (down to 2
  seconds).
  Hope this post helps to improve this part (using 0.5.0rc1, python 2.5,
  linux i686, 1.5Go RAM)
  SA is rocking!
  jean-philippe
  Time elapsed:  48.4475638866 s
   8875046 function calls (8869157 primitive calls) in 48.443
  CPU seconds
 Ordered by: internal time, call count
 List reduced from 390 to 10 due to restriction 10
 ncalls  tottime  percall  cumtime  percall
  filename:lineno(function)
  1292937/12922507.8790.000   12.1340.000 attributes.py:
  132(__get__)
12410137.6620.000   39.8360.000 orderinglist.py:
  221(_order_entity)
12410135.8700.000   16.9160.000 orderinglist.py:
  202(_get_order_value)
 4408094.5220.0009.5270.000 attributes.py:394(set)
   12364.1980.003   44.0250.036 orderinglist.py:
  208(reorder)
  1299736/12990483.7520.0004.3730.000 attributes.py:
  310(get)
 4482253.3370.0005.1570.000 identity.py:
  208(modified_event)
 4370612.7040.000   14.3310.000 orderinglist.py:
  205(_set_order_value)
 4408092.2250.000   11.7520.000 attributes.py:
  126(__set__)
 4482251.7750.0001.8120.000 attributes.py:
  958(modified_event)
  Function   was called by...
  attributes.py:132(__get__) - domain.py:200(addEntry)
  (1236)   46.741
domain.py:248(__init__)
  (1236)   47.832
domain.py:272(get)(49452)
  0.609
orderinglist.py:
  202(_get_order_value)(1241013)   16.916
  orderinglist.py:221(_order_entity) - orderinglist.py:208(reorder)
  (1240326)   44.025
orderinglist.py:232(append)
  (687)0.013
  orderinglist.py:202(_get_order_value)  - orderinglist.py:
  221(_order_entity)(1241013)   39.836
  attributes.py:394(set) - attributes.py:126(__set__)
  (440809)   11.752
  orderinglist.py:208(reorder)   - orderinglist.py:
  266(__setslice__)(1236)   44.061
  attributes.py:310(get) - attributes.py:132(__get__)
  (1292937)   12.134
attributes.py:
  347(get_committed_value)(1)0.000
attributes.py:500(set)
  (3708)0.367
attributes.py:
  837(value_as_iterable)(3090)0.108
  identity.py:208(modified_event)- attributes.py:394(set)
  (440809)9.527
 

[sqlalchemy] Cyclic references and Nullable

2008-09-23 Thread mraer

Suppose I have two classes of objects which have a reference to each
other:

Class A:
b

Class B:
a

Both references are mandatory so nullable = False
I use post_update = True in relation function and use_alter = True in
ForeignKey constructor

After it I try to add two objects:

session = Session()
a = A()
b = B()
a.b = b
b.a = a
session.commit()

and catch an exception, because with post_update = True NULL inserted
in DB, but without post_update= True I can't add two circular
depending objects.

How to solve with situation to add such pair of objects into DB?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Relations - How?

2008-09-23 Thread g00fy

Thnx a lot Alex! I already love Pylons and SQLAlchemy!

On 23 Wrz, 12:16, Alex K [EMAIL PROTECTED] wrote:
 Hello, here is the answer:

 # -*- coding: utf-8 -*-

 from sqlalchemy import create_engine
 from sqlalchemy import Table, Column, Integer, String, Unicode,
 MetaData, ForeignKey
 from sqlalchemy.orm import sessionmaker, mapper,
 dynamic_loader,backref, relation, composite, comparable_property,
 contains_eager,aliased
 from sqlalchemy.sql import text, and_
 from orm.object.factory import make_papped_class
 from sqlalchemy.orm import PropComparator

 from common import Application

 app = Application()
 session = app.session
 metadata = app.metadata

 
 Book :
 isbn - integer
 translations - many translations
 prices - many prices

 Translation :
 book - FK to book
 laguage - FK to Language (oneTOone)
 title - string

 Language :
 code - string
 name  - string

 Currency :
 code - string
 name - string

 Price :
 currency - FK
 book -FK
 brutto - int
 netto - int

 

 book_table = Table('book', metadata,
                     Column('isbn', Integer, primary_key=True)
                     )

 language_table = Table('language', metadata,
                     Column('code', String(5), primary_key=True),
                     Column('name', Unicode(20)),
                     )

 currency_table = Table('currency', metadata,
                     Column('code', String(5), primary_key=True),
                     Column('name', Unicode(20)),
                     )

 translation_table = Table('translation', metadata,
                     Column('book_id', Integer,
 ForeignKey(book_table.c.isbn)),
                     Column('language_id', String(5),
 ForeignKey(language_table.c.code)),
                     Column('title', Unicode(512)),
                     )

 price_table = Table('price', metadata,
                     Column('currency_id', String(5),
 ForeignKey(currency_table.c.code)),
                     Column('book_id', Integer,
 ForeignKey(book_table.c.isbn)),
                     Column('brutto', Integer),
                     Column('netto', Integer)
                     )

 metadata.create_all()

 #create objects
 class Book(object):
     def __init__(self,isbn):
         self.isbn = isbn

 class Language(object):
     def __init__(self,code,name):
         self.code = code
         self.name = name

 class Currency(object):
     def __init__(self,code,name):
         self.code = code
         self.name = name

 class Translation(object):
     def __init__(self,book_id,language_id,title):
         self.book_id = book_id
         self.language_id = language_id
         self.title = title

 class Price(object):
     def __init__(self,currency_id,book_id,brutto,netto):
         self.currency_id = currency_id
         self.book_id = book_id
         self.brutto = brutto
         self.netto = netto

 mapper(Book,book_table)
 mapper(Language,language_table)
 mapper(Currency,currency_table)

 mapper(Translation,translation_table,properties = {
     'book': relation(Book,
                      lazy = False, #1 note lazy here, it means that
 we
                                    # will use lazy loading (more
 details in the docs
                      backref = backref('translations',lazy = False)),
     'language': relation(Language,
                          uselist = False,#2 note uselist, it means
                                          #we use one-to-one instead of
 one-to-many
                          lazy = False),},

 primary_key = [translation_table.c.book_id,
                translation_table.c.language_id]
 #explicit primary key is needed when SQLA can not assemble the one for
 you automatically
 );

 mapper(Price,price_table,properties = {
     'currency': relation(Currency,lazy = False),
     'book': relation(Book,lazy = False, backref =
 backref('prices',lazy = False))

 },primary_key = [price_table.c.book_id,price_table.c.currency_id]);

 session = app.session

 if False: #change this to True to issue add statement
     session.add(Language('en',u'English'))
     session.add(Language('ru',u'Русский'))

     session.add(Currency('usd',u'Dollar'))
     session.add(Currency('rub',u'Рубль'))

     session.add(Book(1))
     session.add(Book(2))

     session.flush()

     session.add(Translation(book_id = 1, language_id = 'en', title =
 u'The book'))
     session.add(Translation(book_id = 1, language_id = 'ru', title =
 u'Книжка'))

     session.add(Translation(book_id = 2, language_id = 'en', title =
 u'Book'))
     session.add(Translation(book_id = 2, language_id = 'ru', title =
 u'Книжка2'))

     session.add(Price(book_id = 1, currency_id = 'usd', brutto = 12,
 netto = 20))
     session.add(Price(book_id = 1, currency_id = 'rub', brutto = 250,
 netto = 500))

     session.add(Price(book_id = 2, currency_id = 'usd', brutto = 10,
 netto = 18))
     session.add(Price(book_id = 2, currency_id = 'rub', brutto = 200,
 netto = 440))

 
  would like now to get books that:
 isbn1
 translation.title in 

[sqlalchemy] Re: Cyclic references and Nullable

2008-09-23 Thread Alex K

And how can you do this via explicit SQL?

On 23 сент, 18:32, mraer [EMAIL PROTECTED] wrote:
 Suppose I have two classes of objects which have a reference to each
 other:

 Class A:
 b

 Class B:
 a

 Both references are mandatory so nullable = False
 I use post_update = True in relation function and use_alter = True in
 ForeignKey constructor

 After it I try to add two objects:

 session = Session()
 a = A()
 b = B()
 a.b = b
 b.a = a
 session.commit()

 and catch an exception, because with post_update = True NULL inserted
 in DB, but without post_update= True I can't add two circular
 depending objects.

 How to solve with situation to add such pair of objects into DB?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Cyclic references and Nullable

2008-09-23 Thread mraer

I think it depends on specific DB. In DBs I can start checking
constraints after comitting a transaction, I think.

On Sep 23, 6:58 pm, Alex K [EMAIL PROTECTED] wrote:
 And how can you do this via explicit SQL?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] multiprocessing best practices

2008-09-23 Thread mg

Hello There,
I am developing an application that uses sqlalchemy and the py
processing packages. My question is this, what is the best practice
for using sessions in this type of app. Each subprocess needs to
access my db to get work, so currently I am starting a scoped session
in the run method of the class and using that session for all db work
within that subprocess. Is that the optimal way, or is there a better
way of working?

thanks,
Matt
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: multiprocessing best practices

2008-09-23 Thread Laurence Rowe

On Sep 23, 4:20 pm, mg [EMAIL PROTECTED] wrote:
 Hello There,
 I am developing an application that uses sqlalchemy and the py
 processing packages. My question is this, what is the best practice
 for using sessions in this type of app. Each subprocess needs to
 access my db to get work, so currently I am starting a scoped session
 in the run method of the class and using that session for all db work
 within that subprocess. Is that the optimal way, or is there a better
 way of working?

Use one scoped session + engine per process. Engines cannot be shared
across processes. The scoped session machinery will give you thread
local connections.

If you are not using threads you do not strictly need scoped
sessions,
but using them now gives you the flexibility to use them should you
choose to at some point in the future.

Laurence


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Binding different engines to a session object

2008-09-23 Thread Andrew


Hi:

I have been having problems in attempting to rebind a given Session
object/class to a different engine instance. It seems that it can only
be bound once. Subsequent sql statements post to the original binding.
I am using sqlalchemy 0.4.6 with elixir. Here is an example:


__session__  = scoped_session(sessionmaker(...))

Session.configure(bind=engine1)

# do sql ops here on engine1 via elixir

__session__.configure(bind=engine2)

# do sql ops here against engine2 via elixir
# sql ops here get written to database that engine1 is pointing to.


The 2 engines I am referring to have identical schemas in different
databases. In fact there may be multiple databases (with the same
schema) in which I need to set the database context before performing
sql operations. I thought the __session__.configure(bind=) would do
the trick but it doesn't seem to be working. Can anyone provide some
insight into how I can use sqlalchemy to write to multiple databases
with the same schema? As shown this app is multithreaded (zope).  I
have received some advice to create a new session instance per schema
(eg: sess=Session()) however elixir only refers to a single session
object/class (__session__)  which makes things difficult. Do I need to
dump elixir and just use sqlalchemy to get this functionality?

Thanks!



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: BINARY Columns in MS SQL

2008-09-23 Thread Rick Morrison
So I'm working a bit on this.


 If base64 encoding within the bind_processor() can fix
 MS-SQL for now, I'd say that would be the approach for the time
 being.

turns out base64 encoding is problematic: it requires a corresponding decode
for the data retrieval, which makes it effectively useless for legacy
databases with extant values stored.

I think an out-only encoding as follows would be all that's needed for now:


class MSBinary(sqltypes.Binary):
def bind_processor(self, dialect):
def process(value):
return '0x' + value.encode('hex')
return process

...the issue is that the emitted value is getting quoted somewhere after the
type conversion:

 INSERT INTO binary_table VALUES(, '0x6df02da', ...)

but MSSQL's parser can recognize the '0x' prefix and wants the value
unquoted:

 INSERT INTO binary_table VALUES(, 0x6df02da, ...)

So how do I get the Dialect to drop the quoting behavior for the return from
bind_processor() ?



 I guess you've never gotten testtypes.py BinaryTest to run with MS-SQL?

Nope. I run so far only with pymssql, which has never had a chance of
passing with binary data.

 I would definitely want some test coverage in test/dialect/mssql.py for
this, I'm pretty shocked nobody has had this problem before.

If this works, the regular BinaryTest unit test should pass. There's a few
other basic tests in typetest.py that still fail for MSSQL. Someday in a
dream world where I have time to look at them and fix them, I will.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Binding different engines to a session object

2008-09-23 Thread Andrew

I believe I have fixed my problem. Instead of binding/rebinding a
given engine instance to the Session, I bind to the metadata object.
This solves the problem instead. I found that there are numerous
references to being able to rebind metadata but not sessions.



On Sep 23, 9:03 am, Andrew [EMAIL PROTECTED] wrote:
 Hi:

 I have been having problems in attempting to rebind a given Session
 object/class to a different engine instance. It seems that it can only
 be bound once. Subsequent sql statements post to the original binding.
 I am using sqlalchemy 0.4.6 with elixir. Here is an example:

 __session__  = scoped_session(sessionmaker(...))

 Session.configure(bind=engine1)

 # do sql ops here on engine1 via elixir

 __session__.configure(bind=engine2)

 # do sql ops here against engine2 via elixir
 # sql ops here get written to database that engine1 is pointing to.

 The 2 engines I am referring to have identical schemas in different
 databases. In fact there may be multiple databases (with the same
 schema) in which I need to set the database context before performing
 sql operations. I thought the __session__.configure(bind=) would do
 the trick but it doesn't seem to be working. Can anyone provide some
 insight into how I can use sqlalchemy to write to multiple databases
 with the same schema? As shown this app is multithreaded (zope).  I
 have received some advice to create a new session instance per schema
 (eg: sess=Session()) however elixir only refers to a single session
 object/class (__session__)  which makes things difficult. Do I need to
 dump elixir and just use sqlalchemy to get this functionality?

 Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: cascades over many-to-many

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 9:06 AM, [EMAIL PROTECTED] wrote:

 so far i've used all and it seems to work for one single owner, but
 i havent really tried the multiple case. the whole thing seems to me
 like a splitted reference-counting mechanism, one side being weakref,
 other not.


its not going to work for multiple owners since cascades dont take  
backrefs into account.  m2m cascade is not a met use case here as it  
has more ambiguous behavior and complexity, and is something you'd  
have to roll yourself.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Cyclic references and Nullable

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 10:32 AM, mraer wrote:


 Suppose I have two classes of objects which have a reference to each
 other:

 Class A:
 b

 Class B:
 a

 Both references are mandatory so nullable = False
 I use post_update = True in relation function and use_alter = True in
 ForeignKey constructor

 After it I try to add two objects:

 session = Session()
 a = A()
 b = B()
 a.b = b
 b.a = a
 session.commit()

 and catch an exception, because with post_update = True NULL inserted
 in DB, but without post_update= True I can't add two circular
 depending objects.

 How to solve with situation to add such pair of objects into DB?

this is only possible if you can configure your database to not  
enforce foreign key constraints until transaction commit time.  Its  
something I have never done myself but I understand it is possible  
with some databases.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Binding different engines to a session object

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 11:03 AM, Andrew wrote:



 Hi:

 I have been having problems in attempting to rebind a given Session
 object/class to a different engine instance. It seems that it can only
 be bound once. Subsequent sql statements post to the original binding.
 I am using sqlalchemy 0.4.6 with elixir. Here is an example:


 __session__  = scoped_session(sessionmaker(...))

 Session.configure(bind=engine1)

 # do sql ops here on engine1 via elixir

 __session__.configure(bind=engine2)

this is the wrong approach.   configure() binds the overall  
scoped_session, which is not itself a session but rather a container  
of many sessions as well as a factory for new sessions, to a new  
engine which only takes effect upon the next creation of a Session.  
configure() is only used for an application which would like to  
establish a scoped_session but does not yet have an available engine.

To bind a single session to an engine, simply assign the bind  
attribute:

mysession.bind = someengine

when using a scoped_session(), the current session is available by  
calling it:

mysession = __session__()





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: BINARY Columns in MS SQL

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 4:24 PM, Rick Morrison wrote:

 class MSBinary(sqltypes.Binary):
 def bind_processor(self, dialect):
 def process(value):
 return '0x' + value.encode('hex')
 return process

 ...the issue is that the emitted value is getting quoted somewhere  
 after the type conversion:

  INSERT INTO binary_table VALUES(, '0x6df02da', ...)

 but MSSQL's parser can recognize the '0x' prefix and wants the value  
 unquoted:

  INSERT INTO binary_table VALUES(, 0x6df02da, ...)

 So how do I get the Dialect to drop the quoting behavior for the  
 return from bind_processor() ?

SQLA doesn't quote bind values.  It passes bind parameters, so this is  
likely pyodbc assigning quoting.

The first step here is to get a full binary round trip working with  
only pyodbc, no SQLA in use.  The dialect can then be adjusted to do  
whatever is needed in that regard.



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread Michael Bayer


On Sep 22, 2008, at 11:20 PM, CodeIsMightier wrote:


   for link_label, link_dest_title, dest_frag in
 self.parse_links(self.text):
   print 'LINK from:', repr(self.title), 'to',
 repr(link_dest_title + '#' + dest_frag), 'label', repr(link_label)
   try:
   link_dest =
 session.query(Article).filter_by(title=link_dest_title).one()
   except sqlalchemy.orm.exc.NoResultFound:
   link_dest = None
   print link_dest
   session.add(Link(self, link_label, link_dest, dest_frag))

 Basically what this does is that it parses the links in a page, looks
 it up in the DB to resolve the reference, and then insert a Link into
 the DB. The problem is that the articles table is over 7 million
 rows and there are maybe 50 million links.


commenting only on this snippet:

1. throwing/catching exceptions in Python is time consuming.   So I  
would not rely upon one() to achieve this, instead say all() and look  
to see if rows are present.
2. if the point of session.add(Link(..)) is so that a new Link object  
is added, you *definitely* need to flush every 100 objects or so.  The  
Session, when flush() proceeds, has to sort its entire contents of  
dirty objects so this takes exponentially longer as the size of  
unflushed objects grows - so this list should be kept small.
3. the ORM's creation of objects is time consuming.  So I would look  
at selecting individual columns instead of objects, i.e.  
sess.query(Article.id).filter(...).all().
4. the ORM overall is designed to simplify a complex set of operations  
over a complex object graph, and is not optimized for raw speed as  
much as plain SQL expressions.  Since this is a very simplistic  
operation, you'll get much better performance using plain SQL  
expressions (see the tutorial at 
http://www.sqlalchemy.org/docs/05/sqlexpression.html 
  ).   I wouldn't use the ORM at all for this particular operation.
5. The whole pattern of fetch a row, doesnt exist, then insert is  
itself slow.   If you're using MySQL, consider using the REPLACE  
construct which would remove the need to check first for a row.
6. depending on the database in use, you probably want to commit your  
transaction every X number of rows - an exceedingly large amount of  
changes in one transaction can also put a  stress on the operation.
7. If the full database of articles is 7 million, you might even try  
to load the full list of article IDs and titles into an in-memory  
dictionary (only the integer id and string title, *not* a full ORM  
object) and operate that way.  It would use a few gigs of RAM but  
would only take a few minutes to try out to see if its feasable.
Things like memcached and DBM might be of use here too.




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Relations - How?

2008-09-23 Thread g00fy

One more thing,

If i have just book and translations

1 Book has 3 translations

this sql will create me 3 rows (due to the join)

isbn translation_code translation_text
1en   The Book
1ru   Книжка
1pl   Ksiazka

so if i have 10 books i will have 30 rows.

this is NxM,
what if I just want to join by :

LEFT OUTER JOIN translation AS translation_1
ON (book.isbn = translation_1.book_id ,
translation_1.code=en)

so i will only get the required information ?
Limiting the results will also be a problem becouse simple LIMIT will
not be enough.

(consider having 30 multiple relations, it will generate huge ammount
of data M*n*x*...)


On 23 Wrz, 16:37, g00fy [EMAIL PROTECTED] wrote:
 Thnx a lot Alex! I already love Pylons and SQLAlchemy!

 On 23 Wrz, 12:16, Alex K [EMAIL PROTECTED] wrote:

  Hello, here is the answer:

  # -*- coding: utf-8 -*-

  from sqlalchemy import create_engine
  from sqlalchemy import Table, Column, Integer, String, Unicode,
  MetaData, ForeignKey
  from sqlalchemy.orm import sessionmaker, mapper,
  dynamic_loader,backref, relation, composite, comparable_property,
  contains_eager,aliased
  from sqlalchemy.sql import text, and_
  from orm.object.factory import make_papped_class
  from sqlalchemy.orm import PropComparator

  from common import Application

  app = Application()
  session = app.session
  metadata = app.metadata

  
  Book :
  isbn - integer
  translations - many translations
  prices - many prices

  Translation :
  book - FK to book
  laguage - FK to Language (oneTOone)
  title - string

  Language :
  code - string
  name  - string

  Currency :
  code - string
  name - string

  Price :
  currency - FK
  book -FK
  brutto - int
  netto - int

  

  book_table = Table('book', metadata,
                      Column('isbn', Integer, primary_key=True)
                      )

  language_table = Table('language', metadata,
                      Column('code', String(5), primary_key=True),
                      Column('name', Unicode(20)),
                      )

  currency_table = Table('currency', metadata,
                      Column('code', String(5), primary_key=True),
                      Column('name', Unicode(20)),
                      )

  translation_table = Table('translation', metadata,
                      Column('book_id', Integer,
  ForeignKey(book_table.c.isbn)),
                      Column('language_id', String(5),
  ForeignKey(language_table.c.code)),
                      Column('title', Unicode(512)),
                      )

  price_table = Table('price', metadata,
                      Column('currency_id', String(5),
  ForeignKey(currency_table.c.code)),
                      Column('book_id', Integer,
  ForeignKey(book_table.c.isbn)),
                      Column('brutto', Integer),
                      Column('netto', Integer)
                      )

  metadata.create_all()

  #create objects
  class Book(object):
      def __init__(self,isbn):
          self.isbn = isbn

  class Language(object):
      def __init__(self,code,name):
          self.code = code
          self.name = name

  class Currency(object):
      def __init__(self,code,name):
          self.code = code
          self.name = name

  class Translation(object):
      def __init__(self,book_id,language_id,title):
          self.book_id = book_id
          self.language_id = language_id
          self.title = title

  class Price(object):
      def __init__(self,currency_id,book_id,brutto,netto):
          self.currency_id = currency_id
          self.book_id = book_id
          self.brutto = brutto
          self.netto = netto

  mapper(Book,book_table)
  mapper(Language,language_table)
  mapper(Currency,currency_table)

  mapper(Translation,translation_table,properties = {
      'book': relation(Book,
                       lazy = False, #1 note lazy here, it means that
  we
                                     # will use lazy loading (more
  details in the docs
                       backref = backref('translations',lazy = False)),
      'language': relation(Language,
                           uselist = False,#2 note uselist, it means
                                           #we use one-to-one instead of
  one-to-many
                           lazy = False),},

  primary_key = [translation_table.c.book_id,
                 translation_table.c.language_id]
  #explicit primary key is needed when SQLA can not assemble the one for
  you automatically
  );

  mapper(Price,price_table,properties = {
      'currency': relation(Currency,lazy = False),
      'book': relation(Book,lazy = False, backref =
  backref('prices',lazy = False))

  },primary_key = [price_table.c.book_id,price_table.c.currency_id]);

  session = app.session

  if False: #change this to True to issue add statement
      session.add(Language('en',u'English'))
      session.add(Language('ru',u'Русский'))

      

[sqlalchemy] Performance problem related to saving newly created objects.

2008-09-23 Thread Randallabra

Here is the scenario. I'm using SA as the data abstraction/access
layer between a desktop application and postgresql database. The user
interacts with the application primarily by browsing large numbers of
records (on the order of tens of thousands of records at once).
Occasionally the users needs to create new records. After the user has
interacted with the application for a while, there may be over 100,000
objects that have been fetched into memory. These objects appear in
tables that the user sorts and searches, drop down lists for changing
foreign key relations and so on.

The problem appears when it comes time to create and save a new
record. When saving a new record, the commit often takes more than 20
seconds. This is an awfully long time for the user to wait while
looking at a spinning beach ball. In some extreme cases, saving a new
record can take minutes.

Now, my question is, am I using SA the way it was intended to be used?
For example, I create a single session when the application starts up
and use that single session throughout for all interactions with the
database. The reason behind this is because the objects that are used
to populate drop down lists and look-up tables in the user interface,
need to persist, such that when editing existing records or adding new
ones, I don't want to have to merge all of those objects into a new
session (there are a LOT of user interface elements that contain
references to instances of SA objects) all the time just to make a
simple edit to a record.

I posted a TRAC ticket which was immediately closed as invalid
because, I was told, I should be flush()'ing the session ever 1,000
objects or so. I turned on autoflush in the sessionmaker() as a test
and I have not seen any improvement in performance. Furthermore, the
behavior of autoflush is incompatible with the way users interact with
the application in that when autoflush is on, the session attempts to
flush incomplete objects to the database, which end up violating non-
null rules, foreign key constraints and whatnot.

Is there a particular idiom that I'm missing when it comes to using SA
as the DAL for a desktop application?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Performance problem related to saving newly created objects.

2008-09-23 Thread Michael Bayer

I misread your ticket and the resolution has been corrected.  The
commit() operation expires all objects present in the session as
described in 
http://www.sqlalchemy.org/docs/05/session.html#unitofwork_using_committing
.   Turn off expire_on_commit to disable the expiration operation,
which normally would loop through all 100K objects and set an
expiration flag for all of their attributes.  Keep in mind that if any
concurrent transactions take place subsequent to the end of your
transaction, with expire_on_commit=False you may be operating with
stale data.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: BINARY Columns in MS SQL

2008-09-23 Thread Michael Bayer


On Sep 23, 2008, at 7:17 PM, Rick Morrison wrote:

  The first step here is to get a full binary round trip working with
  only pyodbc, no SQLA in use.

 Well, that's how I got as far as I did, but that was with straight  
 text, no bind params

  so this is likely pyodbc assigning quoting.

 OK then, dead end #2. I'll return to this when I've got a few more  
 minutes.


other than this seems to be a bug in pyodbc, obviously we're probably  
going to have to go with the CONVERT() thing.   The path to that,  
since I wanted this to work for Postgres GIS functions as well,  is a  
new method is added to TypeEngine called something like  
bind_converter(), MSBinary provides func.convert() from its  
bind_converter() method, and compiler.visit_bindparam() calls upon  
this method.  A little bit of plumbing would need to exist such that  
MSBinary can determine exactly how the func.convert() arguments are  
passed.   This is something that can be worked out as a patch to be  
reviewed, we should add a trac ticket for it.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: BINARY Columns in MS SQL

2008-09-23 Thread John Hampton

Michael Bayer wrote:
 SQLA doesn't quote bind values.  It passes bind parameters, so this is  
 likely pyodbc assigning quoting.
 
 The first step here is to get a full binary round trip working with  
 only pyodbc, no SQLA in use.  The dialect can then be adjusted to do  
 whatever is needed in that regard.

Along these lines, I have created a post[1] on the pyodbc Help forum 
regarding the unit tests failing for BINARY and a few other types.

I haven't heard anything back yet.  I'll create a bug soon if nothing is 
forthcoming.

-John

[1] http://sourceforge.net/forum/forum.php?thread_id=2262548forum_id=550700

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---