Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Simon King
On Thu, Nov 23, 2017 at 6:55 PM, Антонио Антуан  wrote:
>
>
> чт, 23 нояб. 2017 г. в 20:27, Mike Bayer :
>>
>> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан 
>> wrote:
>> >
>> >> A Query can have lots of entities in it, and if you're doing sharding a
>> >> single result set can refer to any number of shard identifiers within
>> >> not just a single result set but within a single row; they might have
>> >> come from dozens of different databases at once
>> >
>> > In my case it is not possible: all entities in query can be gotten only
>> > from
>> > one particular shard. We have totally the same database structure for
>> > each
>> > shard. The difference is just data stored into database. No `shard_id`
>> > or
>> > any other key as part of primary key for any table.
>>
>>
>> so just to note, these aren't "shards", they're tenants.  you have a
>> multi-tenant application, which is normally a really easy thing.  but
>> you have a few side applications that want to "cheat" and use the
>> per-tenant object model across multiple tenants simultaneously in the
>> scope of a single Session.
>>
>> > If I want to make query
>> > for particular database I always want to retrieve data ONLY from that
>> > database. And even more than that: ONLY one database during one session
>> > transaction (or, in other words, one http-request to our app).
>>
>> if you have one "tenant id" per HTTP request, the standard HTTP
>> request pattern is one Session per request.There's no problem in
>> that case.  You mentioned you have some non-flask applications that
>> want to communicate with multiple tenants in one Session.
>
>
> Yes, you're right. We have some offline actions, when we want to ask each
> tenant about something specific.
> I see, that currently the most safe way is to call `commit`, `rollback`,
> `remove` or `expunge_all` on session instance: all this methods drops
> identity map. Please let me know if I'm wrong.

A couple of things here. First, you are using ScopedSession, which is
essentially a wrapper around an actual Session. The commit(),
rollback() and expunge_all() methods are proxies that pass directly
through to the underlying Session. I believe commit() and rollback()
*expire* instances (so attributes will be reloaded on the next
access), but don't actually remove them from the identity map (but I
could be wrong about this).

remove() is not a Session method though - it tells the ScopedSession
to discard the current Session. A new Session will be created the next
time you call one of the proxied methods.

The default behaviour for ScopedSession is to use thread-locals, so
each thread gets its own Session. However, you can provide your own
scoping function that does whatever you want:

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-custom-created-scopes

It sounds like you could pass a "get_current_tenant" function to the
scoped session, along with a custom session_factory, to get the
behaviour you want. Something like this (untested, would definitely
require more care to make it thread-safe and so on):

class SessionManager(object):
def __init__(self, tenant_uris):
self.engines = {}
self.current_tenant = None
for name, dburi in tenant_uris.items():
self.engines[name] = sa.create_engine(name)
self.sessionmaker = saorm.sessionmaker()

def get_current_tenant(self):
return self.current_tenant

def set_current_tenant(self, name):
self.current_tenant = name

def create_session(self):
engine = self.engines[self.current_tenant]
return self.sessionmaker(bind=engine)

tenant_uris = {
'one': 'mysql://...',
'two': 'mysql://...',
}
manager = SessionManager(tenant_uris)

Session = saorm.scoped_session(manager.create_session,
scopefunc=manager.get_current_tenant)

Base.query = Session.query_property


As long as you call manager.set_current_tenant whenever you switch to
querying a different tenant, this ought to work. But note that all of
this confusion and complexity stems from using scoped sessions and
Base.query. If you used explicit sessions everywhere, you would
probably find your code less magical and easier to understand.

Hope that helps,

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
чт, 23 нояб. 2017 г. в 20:27, Mike Bayer :

> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан 
> wrote:
> >
> >> A Query can have lots of entities in it, and if you're doing sharding a
> >> single result set can refer to any number of shard identifiers within
> >> not just a single result set but within a single row; they might have
> >> come from dozens of different databases at once
> >
> > In my case it is not possible: all entities in query can be gotten only
> from
> > one particular shard. We have totally the same database structure for
> each
> > shard. The difference is just data stored into database. No `shard_id` or
> > any other key as part of primary key for any table.
>
>
> so just to note, these aren't "shards", they're tenants.  you have a
> multi-tenant application, which is normally a really easy thing.  but
> you have a few side applications that want to "cheat" and use the
> per-tenant object model across multiple tenants simultaneously in the
> scope of a single Session.
>
> > If I want to make query
> > for particular database I always want to retrieve data ONLY from that
> > database. And even more than that: ONLY one database during one session
> > transaction (or, in other words, one http-request to our app).
>
> if you have one "tenant id" per HTTP request, the standard HTTP
> request pattern is one Session per request.There's no problem in
> that case.  You mentioned you have some non-flask applications that
> want to communicate with multiple tenants in one Session.
>

Yes, you're right. We have some offline actions, when we want to ask each
tenant about something specific.
I see, that currently the most safe way is to call `commit`, `rollback`,
`remove` or `expunge_all` on session instance: all this methods drops
identity map. Please let me know if I'm wrong.

>
> >
> >> This could only be suited with a very open plugin point that is
> carefully
> >> architected, tested, and documented and I don't have the resources to
> >> envision this for a short-term use case.
> >
> > I've seen a lot of questions (for example in stackoverflow) "how to
> manage
> > several binds with sqlalchemy" and all answers are "Use separated
> sessions".
>
> it's probably the best answer feel free to show specifics and I can
> determine if their request would fit this hypothetical feature
> otherwise.
>
> > I really not understand, what is the problem to implement "multibound"
> > session.
>
> The Session has always supported multiple binds.  There are two
> levels supported.  One is per table/mapper:
>
> http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning
>
> you could probably adapt your multiple tenants into individual
> mappings if there are a limited number, see the approach at
> https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName.
>
> The other level is per primary key, that is, each primary key in the
> identity map has a different bind.  that's horizontal sharding.
>
> You're looking for a new level, which is, multiple binds for *one*
> primary key.   This is an intricate feature request that is feasible
> but not in the short term.
>
>
> > Don't have ideal vocabulary to explain how our team like it :)
> > Currently I see only one problem: loading instances. Of course, after
> fixing
> > other problems may appear...
> >
> > I can (and want) make it as part of SQLAlchemy library. Fully-tested
> part,
> > of course. If you say that it is bad idea, ok then.
>
> it's not a bad idea.  It's just difficult, and I can't do it right now.
>
>
> >I can make it as a
> > plugin, but there is a problem: functions in `loading` module are
> monolithic
> > and it needs some refactor for the plugin. May I suggest refactor as pull
> > request?
>
> you can do a pull request but note that the PR process for SQLAlchemy
> is not quick.   90% of code-related pull requests I get have no tests,
> no documentation, or anything.More elaborate feature requests
> typically involve that I end up doing the whole thing myself in any
> case, using the submitter's original code as just a sketch, which
> means that more involved PRs are usually just another form of feature
> request.These PRs are almost always for Core level features as the
> Core is easier for outside contributors to work on.   ORM-level
> contributions are extremely rare these days but of course I welcome
> contributors for the ORM.
>
> And if so: could it be merged not only for major release but for,
> > at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
>
> The 1.0 series is in "maintenance" mode and as soon as 1.2 is released
> (which is hopefully by end of year) it will go into "Security" mode.
> There are no more 1.0 releases scheduled.   It is not reasonable to be
> doing new SQLAlchemy-oriented development without first upgrading your
> application to the latest release which in this case is 1.1.15.
>
>
> > I really don't want (of course!) to copy entire `loading` module f

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Simon King
On Thu, Nov 23, 2017 at 5:27 PM, Mike Bayer  wrote:
> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан  wrote:
>>
>>> A Query can have lots of entities in it, and if you're doing sharding a
>>> single result set can refer to any number of shard identifiers within
>>> not just a single result set but within a single row; they might have
>>> come from dozens of different databases at once
>>
>> In my case it is not possible: all entities in query can be gotten only from
>> one particular shard. We have totally the same database structure for each
>> shard. The difference is just data stored into database. No `shard_id` or
>> any other key as part of primary key for any table.
>
>
> so just to note, these aren't "shards", they're tenants.  you have a
> multi-tenant application, which is normally a really easy thing.  but
> you have a few side applications that want to "cheat" and use the
> per-tenant object model across multiple tenants simultaneously in the
> scope of a single Session.
>
>> If I want to make query
>> for particular database I always want to retrieve data ONLY from that
>> database. And even more than that: ONLY one database during one session
>> transaction (or, in other words, one http-request to our app).
>
> if you have one "tenant id" per HTTP request, the standard HTTP
> request pattern is one Session per request.There's no problem in
> that case.  You mentioned you have some non-flask applications that
> want to communicate with multiple tenants in one Session.
>

OP, can you describe in more detail why these applications need to
talk to multiple tenant databases in a single session? Perhaps there
might be an alternative way to approach that.

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Mike Bayer
On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан  wrote:
>
>> A Query can have lots of entities in it, and if you're doing sharding a
>> single result set can refer to any number of shard identifiers within
>> not just a single result set but within a single row; they might have
>> come from dozens of different databases at once
>
> In my case it is not possible: all entities in query can be gotten only from
> one particular shard. We have totally the same database structure for each
> shard. The difference is just data stored into database. No `shard_id` or
> any other key as part of primary key for any table.


so just to note, these aren't "shards", they're tenants.  you have a
multi-tenant application, which is normally a really easy thing.  but
you have a few side applications that want to "cheat" and use the
per-tenant object model across multiple tenants simultaneously in the
scope of a single Session.

> If I want to make query
> for particular database I always want to retrieve data ONLY from that
> database. And even more than that: ONLY one database during one session
> transaction (or, in other words, one http-request to our app).

if you have one "tenant id" per HTTP request, the standard HTTP
request pattern is one Session per request.There's no problem in
that case.  You mentioned you have some non-flask applications that
want to communicate with multiple tenants in one Session.


>
>> This could only be suited with a very open plugin point that is carefully
>> architected, tested, and documented and I don't have the resources to
>> envision this for a short-term use case.
>
> I've seen a lot of questions (for example in stackoverflow) "how to manage
> several binds with sqlalchemy" and all answers are "Use separated sessions".

it's probably the best answer feel free to show specifics and I can
determine if their request would fit this hypothetical feature
otherwise.

> I really not understand, what is the problem to implement "multibound"
> session.

The Session has always supported multiple binds.  There are two
levels supported.  One is per table/mapper:
http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning

you could probably adapt your multiple tenants into individual
mappings if there are a limited number, see the approach at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName.

The other level is per primary key, that is, each primary key in the
identity map has a different bind.  that's horizontal sharding.

You're looking for a new level, which is, multiple binds for *one*
primary key.   This is an intricate feature request that is feasible
but not in the short term.


> Don't have ideal vocabulary to explain how our team like it :)
> Currently I see only one problem: loading instances. Of course, after fixing
> other problems may appear...
>
> I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
> of course. If you say that it is bad idea, ok then.

it's not a bad idea.  It's just difficult, and I can't do it right now.


>I can make it as a
> plugin, but there is a problem: functions in `loading` module are monolithic
> and it needs some refactor for the plugin. May I suggest refactor as pull
> request?

you can do a pull request but note that the PR process for SQLAlchemy
is not quick.   90% of code-related pull requests I get have no tests,
no documentation, or anything.More elaborate feature requests
typically involve that I end up doing the whole thing myself in any
case, using the submitter's original code as just a sketch, which
means that more involved PRs are usually just another form of feature
request.These PRs are almost always for Core level features as the
Core is easier for outside contributors to work on.   ORM-level
contributions are extremely rare these days but of course I welcome
contributors for the ORM.

And if so: could it be merged not only for major release but for,
> at least, 1.0.* (yes, in our project we use 1.0.19 :) )?

The 1.0 series is in "maintenance" mode and as soon as 1.2 is released
(which is hopefully by end of year) it will go into "Security" mode.
There are no more 1.0 releases scheduled.   It is not reasonable to be
doing new SQLAlchemy-oriented development without first upgrading your
application to the latest release which in this case is 1.1.15.


> I really don't want (of course!) to copy entire `loading` module for
> additional logic of `identitykey` construction. But currently I do not see
> any other way to implement it for my project :(
>
>> when you query two different databases, you are using
>> two independent transactions in any case;
>
> So, what is the difference, if there are two transactions in any case? :)

because you would have two identity maps

>
>> I don't understand why you can't use independent sessions
>
> The first problem is `query` property of `Base` instances. If we use several
> sessions, we need to use the same am

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
See, that `Query.get` using `get_from_identity` instead of `instances` with
key, constructed into `_get_impl`.
Could you point to another places where same problem (described into my
first message) can be appeared?

чт, 23 нояб. 2017 г. в 16:44, Антонио Антуан :

>
> > A Query can have lots of entities in it, and if you're doing sharding a
> > single result set can refer to any number of shard identifiers within
> > not just a single result set but within a single row; they might have
> > come from dozens of different databases at once
>
> In my case it is not possible: all entities in query can be gotten only
> from one particular shard. We have totally the same database structure for
> each shard. The difference is just data stored into database. No `shard_id`
> or any other key as part of primary key for any table. If I want to make
> query for particular database I always want to retrieve data ONLY from that
> database. And even more than that: ONLY one database during one session
> transaction (or, in other words, one http-request to our app).
>
>
> > This could only be suited with a very open plugin point that is carefully
> > architected, tested, and documented and I don't have the resources to
> > envision this for a short-term use case.
>
> I've seen a lot of questions (for example in stackoverflow) "how to manage
> several binds with sqlalchemy" and all answers are "Use separated
> sessions". I really not understand, what is the problem to implement
> "multibound" session. I make it in my project and it's really beautiful,
> clear and... Don't have ideal vocabulary to explain how our team like it :)
> Currently I see only one problem: loading instances. Of course, after
> fixing other problems may appear...
>
> I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
> of course. If you say that it is bad idea, ok then. I can make it as a
> plugin, but there is a problem: functions in `loading` module are
> monolithic and it needs some refactor for the plugin. May I suggest
> refactor as pull request? And if so: could it be merged not only for major
> release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
> I really don't want (of course!) to copy entire `loading` module for
> additional logic of `identitykey` construction. But currently I do not see
> any other way to implement it for my project :(
>
> > when you query two different databases, you are using
> > two independent transactions in any case;
>
> So, what is the difference, if there are two transactions in any case? :)
>
> > I don't understand why you can't use independent sessions
>
> The first problem is `query` property of `Base` instances. If we use
> several sessions, we need to use the same amount of `Base` classes and,
> consequently, the same amount of models, don't we?
> Another problem for us is already existed code. We can use sessions
> registry, but it take a lot of month to override entire project. Another
> way:  append into `Query.__iter__` such a code:
> >> self.with_session(sessions.get(self._shard_id))
> >> return super(Query, self).__iter__()
> But it has no effect for UPDATE and INSERT queries. Also, I'm not sure
> that there is no problems in that way...
>
>
> I have one more thought. Don't you think that it is some kind of bug: I
> make query for one bind and got entity from another. Yes, that behavior is
> not foreseen by library. But from other point of view, library docs have
> examples how to use several binds within one session. So, problem may
> happens not only in my case.
>
> Anyway, can my suggestion (
> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any
> effect for classic one-bound `Session`? If it can't, so, what's the
> problem? :)
>
>
> Excuse me for wasting your time.
> And excuse me if my suggestions are idiotic :)
>
> Appreciate your help.
>
> чт, 23 нояб. 2017 г. в 0:20, Mike Bayer :
>
>> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан 
>> wrote:
>> > Glad to see that you remember my messages :)
>> >
>> > I've dived into `loading` module and I see that currently it is really
>> > complicated to store additional data into pkey for each instance.
>> >
>> > Unfortunately, suggested solutions not good for my project.
>> > Also, I think that `shard` meaning in my case is not the same as usual.
>> >
>> > I want to describe structure of out project, maybe it can help.
>> >
>> > Here is definition of our databases structure:
>> > http://joxi.ru/nAyJVvGiXMv0Dr.
>> > We got master db and several geo databases. Catalogs like `users`,
>> `groups`,
>> > `offers` and other are replicating to geo databases, so that data is
>> always
>> > the same.
>> > But also we have tables like `clicks` and `leads`. Each app instance
>> > contains the data about them in database, related to its geo:
>> > europe-instance into europe-db, usa-instance into usa-database and so
>> on.
>> > Periodically master-app pulls clicks and leads to master-database.
>> Synced
>> > o

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
> A Query can have lots of entities in it, and if you're doing sharding a
> single result set can refer to any number of shard identifiers within
> not just a single result set but within a single row; they might have
> come from dozens of different databases at once

In my case it is not possible: all entities in query can be gotten only
from one particular shard. We have totally the same database structure for
each shard. The difference is just data stored into database. No `shard_id`
or any other key as part of primary key for any table. If I want to make
query for particular database I always want to retrieve data ONLY from that
database. And even more than that: ONLY one database during one session
transaction (or, in other words, one http-request to our app).

> This could only be suited with a very open plugin point that is carefully
> architected, tested, and documented and I don't have the resources to
> envision this for a short-term use case.

I've seen a lot of questions (for example in stackoverflow) "how to manage
several binds with sqlalchemy" and all answers are "Use separated
sessions". I really not understand, what is the problem to implement
"multibound" session. I make it in my project and it's really beautiful,
clear and... Don't have ideal vocabulary to explain how our team like it :)
Currently I see only one problem: loading instances. Of course, after
fixing other problems may appear...

I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
of course. If you say that it is bad idea, ok then. I can make it as a
plugin, but there is a problem: functions in `loading` module are
monolithic and it needs some refactor for the plugin. May I suggest
refactor as pull request? And if so: could it be merged not only for major
release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
I really don't want (of course!) to copy entire `loading` module for
additional logic of `identitykey` construction. But currently I do not see
any other way to implement it for my project :(

> when you query two different databases, you are using
> two independent transactions in any case;

So, what is the difference, if there are two transactions in any case? :)

> I don't understand why you can't use independent sessions

The first problem is `query` property of `Base` instances. If we use
several sessions, we need to use the same amount of `Base` classes and,
consequently, the same amount of models, don't we?
Another problem for us is already existed code. We can use sessions
registry, but it take a lot of month to override entire project. Another
way:  append into `Query.__iter__` such a code:
>> self.with_session(sessions.get(self._shard_id))
>> return super(Query, self).__iter__()
But it has no effect for UPDATE and INSERT queries. Also, I'm not sure that
there is no problems in that way...


I have one more thought. Don't you think that it is some kind of bug: I
make query for one bind and got entity from another. Yes, that behavior is
not foreseen by library. But from other point of view, library docs have
examples how to use several binds within one session. So, problem may
happens not only in my case.

Anyway, can my suggestion (
https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any
effect for classic one-bound `Session`? If it can't, so, what's the
problem? :)


Excuse me for wasting your time.
And excuse me if my suggestions are idiotic :)

Appreciate your help.

чт, 23 нояб. 2017 г. в 0:20, Mike Bayer :

> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан 
> wrote:
> > Glad to see that you remember my messages :)
> >
> > I've dived into `loading` module and I see that currently it is really
> > complicated to store additional data into pkey for each instance.
> >
> > Unfortunately, suggested solutions not good for my project.
> > Also, I think that `shard` meaning in my case is not the same as usual.
> >
> > I want to describe structure of out project, maybe it can help.
> >
> > Here is definition of our databases structure:
> > http://joxi.ru/nAyJVvGiXMv0Dr.
> > We got master db and several geo databases. Catalogs like `users`,
> `groups`,
> > `offers` and other are replicating to geo databases, so that data is
> always
> > the same.
> > But also we have tables like `clicks` and `leads`. Each app instance
> > contains the data about them in database, related to its geo:
> > europe-instance into europe-db, usa-instance into usa-database and so on.
> > Periodically master-app pulls clicks and leads to master-database. Synced
> > objects always have different ids into master- and get-db, so it is ok.
> >
> > But one time project owner came and said: "I need SAAS".
> > We see, that in current structure it's very hard (and really ugly) to
> > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and
> other
> > models will be multiplied with tenants amount.
> >
> > I discovered that I can override `get_bind` with another logic and it was
> > 

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-22 Thread Mike Bayer
I've added an issue to track this feature targeted for the moment at
1.3 (note that's for many months from now), however 1.3 has a lot in
it right now:

https://bitbucket.org/zzzeek/sqlalchemy/issues/4137/sharding-token-extension-in-identity-key



On Wed, Nov 22, 2017 at 4:20 PM, Mike Bayer  wrote:
> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан  wrote:
>> Glad to see that you remember my messages :)
>>
>> I've dived into `loading` module and I see that currently it is really
>> complicated to store additional data into pkey for each instance.
>>
>> Unfortunately, suggested solutions not good for my project.
>> Also, I think that `shard` meaning in my case is not the same as usual.
>>
>> I want to describe structure of out project, maybe it can help.
>>
>> Here is definition of our databases structure:
>> http://joxi.ru/nAyJVvGiXMv0Dr.
>> We got master db and several geo databases. Catalogs like `users`, `groups`,
>> `offers` and other are replicating to geo databases, so that data is always
>> the same.
>> But also we have tables like `clicks` and `leads`. Each app instance
>> contains the data about them in database, related to its geo:
>> europe-instance into europe-db, usa-instance into usa-database and so on.
>> Periodically master-app pulls clicks and leads to master-database. Synced
>> objects always have different ids into master- and get-db, so it is ok.
>>
>> But one time project owner came and said: "I need SAAS".
>> We see, that in current structure it's very hard (and really ugly) to
>> implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other
>> models will be multiplied with tenants amount.
>>
>> I discovered that I can override `get_bind` with another logic and it was
>> great: we can remove several `Base` classes, several Sessions and several
>> `Orders`.
>>
>> Mechanism looks like this:
>> - we use one instance on each geo for all tenants.
>> - we create separated databases for each tenant: this will be multiplied
>> with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.
>> - we detect `tenant_id` using `request.host` (we use flask): each domain
>> binds with particular tenant;
>> - we store `tenant_id` into global storage.
>> - we use stored `tenant_id` into `Session.get_bind`:
>> https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
>> - into flask `teardown_request` we clear `tenant_id` storage and call
>> `Session.remove()`
>> - if we need to read from another get, just write `query =
>> query.set_shard_id(GEO)` `tenant_id`
>>
>> For celery we use this:
>> https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
>> Clear to.
>>
>> All this leads us only to pros, without cons: any developer has no need to
>> think about database chosing, just write code like there is only one
>> database. If you need to read from another geo-database, just call
>> `query.set_shard(GEO)`, tenant will be appended automatically to it.
>>
>> Problems begin when we tried to test non-flask and non-celery scripts, like
>> cron tasks: we may want to query several tenant-databases during one
>> SQLA-transaction, somethins like in my first example:
>> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
>> (`assert_got_correct_objects_with_remove` and
>> `assert_got_cached_objects_without_remove`). The result you know.
>>
>>
>> During writing this message, I found out, that we need only one additional
>> data for primary key: `connection.bind.url`. I see, that SQLA already have
>> it inside `_instance_processor`, so it exists inside `_instance`. I think,
>> that `identity_key` should be constructed (in my case) with this code:
>> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't
>> you think so?
>
> that's where something needs to happen but SQLAlchemy can't do this in
> such a way that is hardcoded to exactly your particular use case.   A
> Query can have lots of entities in it, and if you're doing sharding a
> single result set can refer to any number of shard identifiers within
> not just a single result set but within a single row; they might have
> come from dozens of different databases at once.  This could only be
> suited with a very open plugin point that is carefully architected,
> tested, and documented and I don't have the resources to envision this
> for a short-term use case.
>
>> Problems begin when we tried to test non-flask and non-celery scripts, like
>> cron tasks: we may want to query several tenant-databases during one
>> SQLA-transaction,
>
> I don't understand why you can't use independent sessions for this,
> because when you query two different databases, you are using two
> independent transactions in any case;  they are only coordinated if
> one is using two-phase transactions which I doubt is the case here
> (while SQLAlchemy put a lot of work into making that possible, I don't
> think anyone has ever used that feature).
>
> Instead of:
>
> Session.query(Foo).set_shard(id)
>
> you say:
>
> sessions.get(shard_id).query(Fo

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-22 Thread Mike Bayer
On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан  wrote:
> Glad to see that you remember my messages :)
>
> I've dived into `loading` module and I see that currently it is really
> complicated to store additional data into pkey for each instance.
>
> Unfortunately, suggested solutions not good for my project.
> Also, I think that `shard` meaning in my case is not the same as usual.
>
> I want to describe structure of out project, maybe it can help.
>
> Here is definition of our databases structure:
> http://joxi.ru/nAyJVvGiXMv0Dr.
> We got master db and several geo databases. Catalogs like `users`, `groups`,
> `offers` and other are replicating to geo databases, so that data is always
> the same.
> But also we have tables like `clicks` and `leads`. Each app instance
> contains the data about them in database, related to its geo:
> europe-instance into europe-db, usa-instance into usa-database and so on.
> Periodically master-app pulls clicks and leads to master-database. Synced
> objects always have different ids into master- and get-db, so it is ok.
>
> But one time project owner came and said: "I need SAAS".
> We see, that in current structure it's very hard (and really ugly) to
> implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other
> models will be multiplied with tenants amount.
>
> I discovered that I can override `get_bind` with another logic and it was
> great: we can remove several `Base` classes, several Sessions and several
> `Orders`.
>
> Mechanism looks like this:
> - we use one instance on each geo for all tenants.
> - we create separated databases for each tenant: this will be multiplied
> with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.
> - we detect `tenant_id` using `request.host` (we use flask): each domain
> binds with particular tenant;
> - we store `tenant_id` into global storage.
> - we use stored `tenant_id` into `Session.get_bind`:
> https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
> - into flask `teardown_request` we clear `tenant_id` storage and call
> `Session.remove()`
> - if we need to read from another get, just write `query =
> query.set_shard_id(GEO)` `tenant_id`
>
> For celery we use this:
> https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
> Clear to.
>
> All this leads us only to pros, without cons: any developer has no need to
> think about database chosing, just write code like there is only one
> database. If you need to read from another geo-database, just call
> `query.set_shard(GEO)`, tenant will be appended automatically to it.
>
> Problems begin when we tried to test non-flask and non-celery scripts, like
> cron tasks: we may want to query several tenant-databases during one
> SQLA-transaction, somethins like in my first example:
> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
> (`assert_got_correct_objects_with_remove` and
> `assert_got_cached_objects_without_remove`). The result you know.
>
>
> During writing this message, I found out, that we need only one additional
> data for primary key: `connection.bind.url`. I see, that SQLA already have
> it inside `_instance_processor`, so it exists inside `_instance`. I think,
> that `identity_key` should be constructed (in my case) with this code:
> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't
> you think so?

that's where something needs to happen but SQLAlchemy can't do this in
such a way that is hardcoded to exactly your particular use case.   A
Query can have lots of entities in it, and if you're doing sharding a
single result set can refer to any number of shard identifiers within
not just a single result set but within a single row; they might have
come from dozens of different databases at once.  This could only be
suited with a very open plugin point that is carefully architected,
tested, and documented and I don't have the resources to envision this
for a short-term use case.

> Problems begin when we tried to test non-flask and non-celery scripts, like
> cron tasks: we may want to query several tenant-databases during one
> SQLA-transaction,

I don't understand why you can't use independent sessions for this,
because when you query two different databases, you are using two
independent transactions in any case;  they are only coordinated if
one is using two-phase transactions which I doubt is the case here
(while SQLAlchemy put a lot of work into making that possible, I don't
think anyone has ever used that feature).

Instead of:

Session.query(Foo).set_shard(id)

you say:

sessions.get(shard_id).query(Foo)





>
>
>
> вт, 21 нояб. 2017 г. в 19:15, Mike Bayer :
>>
>> I've looked to see how hard it would be to allow "supplemental"
>> attributes to form part of the mapper's primary key tuple, and it
>> would be pretty hard.   The "easy" part is getting the mapper to set
>> itself up with some extra attributes that can deliver some kind of
>> supplemental value to the identity key.  the harder part is then in
>> load

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-22 Thread Антонио Антуан
Glad to see that you remember my messages :)

I've dived into `loading` module and I see that currently it is really
complicated to store additional data into pkey for each instance.

Unfortunately, suggested solutions not good for my project.
Also, I think that `shard` meaning in my case is not the same as usual.

I want to describe structure of out project, maybe it can help.

Here is definition of our databases structure:
http://joxi.ru/nAyJVvGiXMv0Dr. 
We got master db and several geo databases. Catalogs like `users`,
`groups`, `offers` and other are replicating to geo databases, so that data
is always the same.
But also we have tables like `clicks` and `leads`. Each app instance
contains the data about them in database, related to its geo:
europe-instance into europe-db, usa-instance into usa-database and so on.
Periodically master-app pulls clicks and leads to master-database. Synced
objects always have different ids into master- and get-db, so it is ok.

But one time project owner came and said: "I need SAAS".
We see, that in current structure it's very hard (and really ugly) to
implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other
models will be multiplied with tenants amount.

I discovered that I can override `get_bind` with another logic and it was
great: we can remove several `Base` classes, several Sessions and several
`Orders`.

Mechanism looks like this:
- we use one instance on each geo for all tenants.
- we create separated databases for each tenant: this will be multiplied
with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.

- we detect `tenant_id` using `request.host` (we use flask): each domain
binds with particular tenant;
- we store `tenant_id` into global storage.
- we use stored `tenant_id` into `Session.get_bind`:
https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
- into flask `teardown_request` we clear `tenant_id` storage and call
`Session.remove()`
- if we need to read from another get, just write `query =
query.set_shard_id(GEO)` `tenant_id`

For celery we use this:
https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
Clear to.

All this leads us only to pros, without cons: any developer has no need to
think about database chosing, just write code like there is only one
database. If you need to read from another geo-database, just call
`query.set_shard(GEO)`, tenant will be appended automatically to it.

Problems begin when we tried to test non-flask and non-celery scripts, like
cron tasks: we may want to query several tenant-databases during one
SQLA-transaction, somethins like in my first example:
https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
(`assert_got_correct_objects_with_remove`
and `assert_got_cached_objects_without_remove`). The result you know.


During writing this message, I found out, that we need only one additional
data for primary key: `connection.bind.url`. I see, that SQLA already have
it inside `_instance_processor`, so it exists inside `_instance`. I think,
that `identity_key` should be constructed (in my case) with this code:
https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't
you think so?



вт, 21 нояб. 2017 г. в 19:15, Mike Bayer :

> I've looked to see how hard it would be to allow "supplemental"
> attributes to form part of the mapper's primary key tuple, and it
> would be pretty hard.   The "easy" part is getting the mapper to set
> itself up with some extra attributes that can deliver some kind of
> supplemental value to the identity key.  the harder part is then in
> loading.py where we get new rows from the DB and need this
> value...which means some whole new kind of system would need to
> deliver this for any arbitrary part of the result set given a mapping
> and the selectable we're looking at (keep in mind a Query can have
> lots of the same mapping in a single row with aliases).   This would
> be very complicated to implement and test.   I am not seeing any quick
> way to suit this use case, which has also not ever been requested
> before.
>
>
>
> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer 
> wrote:
> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан 
> wrote:
> >> Hi guys.
> >>
> >> I got this code example:
> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
> >>
> >> Here I make custom `Session` and custom `Query`. As you see, `Session`
> has
> >> several binds.
> >>
> >> Also, you can see that there are two functions:
> >> `assert_got_correct_objects_with_remove` and
> >> `assert_got_cached_objects_without_remove`.
> >>
> >> The first checks that we got correct results if `Session.remove` called.
> >> The second checks, that we got incorrect results if `Session.remove` not
> >> called.
> >>
> >> I understand, that behavior is correct: we don't remove session - so,
> we got
> >> same result from "cache-like"
> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
> >>

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-21 Thread Mike Bayer
I've looked to see how hard it would be to allow "supplemental"
attributes to form part of the mapper's primary key tuple, and it
would be pretty hard.   The "easy" part is getting the mapper to set
itself up with some extra attributes that can deliver some kind of
supplemental value to the identity key.  the harder part is then in
loading.py where we get new rows from the DB and need this
value...which means some whole new kind of system would need to
deliver this for any arbitrary part of the result set given a mapping
and the selectable we're looking at (keep in mind a Query can have
lots of the same mapping in a single row with aliases).   This would
be very complicated to implement and test.   I am not seeing any quick
way to suit this use case, which has also not ever been requested
before.



On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer  wrote:
> On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан  wrote:
>> Hi guys.
>>
>> I got this code example:
>> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
>>
>> Here I make custom `Session` and custom `Query`. As you see, `Session` has
>> several binds.
>>
>> Also, you can see that there are two functions:
>> `assert_got_correct_objects_with_remove` and
>> `assert_got_cached_objects_without_remove`.
>>
>> The first checks that we got correct results if `Session.remove` called.
>> The second checks, that we got incorrect results if `Session.remove` not
>> called.
>>
>> I understand, that behavior is correct: we don't remove session - so, we got
>> same result from "cache-like"
>> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
>>
>> I want to avoid that mechanism and don't want to use `session_identity_map`
>> for different binds. In ideal, bind should be used as part of key for
>> `session_identity_map`, but I guess, that it is not possible.
>> Another way, acceptable for me: disable this mechanism. But I do not found
>> ways to achieve this.
>> And the third option: construct instances manually. Looks like I should copy
>> code from `loading` module and add that method to `CustomSession`:
>
>
> there's really no reason at all to use a "ShardedSession" if you have
> overlapping primary key spaces from each of your binds.   I'm not sure
> if I mentioned this at the beginning of the emails regarding this
> project but I hope that I mentioned just using separate Session
> objects is vastly simpler for non-intricate sharding cases, such as
> where you always know which shard you care about and you don't care
> about any of the others for a certain operation. The point of
> ShardedSession is so that objects pulled from multiple databases can
> be intermingled in the same query and in the same transaction - which
> by definition means they have unique primary keys.   If that's not
> what you're doing here I don't see what advantage all this complexity
> is getting you.
>
> If you're still convinced you need to be using a monolithic
> ShardedSession then there needs to be some kind of translation of data
> such that the mapper sees unique primary keys across the shards, or
> unique classes.
>
> I've tried to think of ways to do this without too much difficulty but
> none of them are really worth the complexity and hackiness it would
> require.   The absolutely quickest and most well-supported, no hacks
> required way would be to properly create a composite primary key on
> your classes, where the second column is your shard id:
>
> class A(Base):
> __tablename__ = 'a'
>
> id = Column(Integer, primary_key=True)
> shard_id = Column(Integer, primary_key=True)
>
> I tried to see if the "shard_id" column can be some kind of expression
> that is not a Column on the Table but the mapper() is not set up to
> support this unless you mapped the whole class to a select()
> construct, which would make for too-complicated SQL, and you'd still
> need to intercept this select() using events to put the right shard id
> in.  Another is to create a custom column that renders in a special
> way, but again you need to create events to intercept it in every case
> to put the right shard id in, and/or remove it from things like
> insert() statements.
>
> by far your two best solutions are: 1. use separate Session objects
> per shard  2. make sure your data actually has shard-specific primary
> keys
>
>
>
>
>
>
>
>>
>> def instances(self, cursor, __context=None):
>> context = __context
>> if context is None:
>> context = QueryContext(self)
>> return self._custom_instances(self, cursor, context)
>>
>>
>>
>> def custom_instances(query, cursor, context):
>>  """copied from `loading.instances` code with disabled
>> `session_identity_map`"""
>>
>>
>>
>> The third way is the most ugly and I want to avoid it.
>>
>> Could you help me with my hard choice and, maybe, suggest any other ways and
>> options? :)
>>
>> Thank you.
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalche

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-21 Thread Mike Bayer
On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан  wrote:
> Hi guys.
>
> I got this code example:
> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
>
> Here I make custom `Session` and custom `Query`. As you see, `Session` has
> several binds.
>
> Also, you can see that there are two functions:
> `assert_got_correct_objects_with_remove` and
> `assert_got_cached_objects_without_remove`.
>
> The first checks that we got correct results if `Session.remove` called.
> The second checks, that we got incorrect results if `Session.remove` not
> called.
>
> I understand, that behavior is correct: we don't remove session - so, we got
> same result from "cache-like"
> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
>
> I want to avoid that mechanism and don't want to use `session_identity_map`
> for different binds. In ideal, bind should be used as part of key for
> `session_identity_map`, but I guess, that it is not possible.
> Another way, acceptable for me: disable this mechanism. But I do not found
> ways to achieve this.
> And the third option: construct instances manually. Looks like I should copy
> code from `loading` module and add that method to `CustomSession`:


there's really no reason at all to use a "ShardedSession" if you have
overlapping primary key spaces from each of your binds.   I'm not sure
if I mentioned this at the beginning of the emails regarding this
project but I hope that I mentioned just using separate Session
objects is vastly simpler for non-intricate sharding cases, such as
where you always know which shard you care about and you don't care
about any of the others for a certain operation. The point of
ShardedSession is so that objects pulled from multiple databases can
be intermingled in the same query and in the same transaction - which
by definition means they have unique primary keys.   If that's not
what you're doing here I don't see what advantage all this complexity
is getting you.

If you're still convinced you need to be using a monolithic
ShardedSession then there needs to be some kind of translation of data
such that the mapper sees unique primary keys across the shards, or
unique classes.

I've tried to think of ways to do this without too much difficulty but
none of them are really worth the complexity and hackiness it would
require.   The absolutely quickest and most well-supported, no hacks
required way would be to properly create a composite primary key on
your classes, where the second column is your shard id:

class A(Base):
__tablename__ = 'a'

id = Column(Integer, primary_key=True)
shard_id = Column(Integer, primary_key=True)

I tried to see if the "shard_id" column can be some kind of expression
that is not a Column on the Table but the mapper() is not set up to
support this unless you mapped the whole class to a select()
construct, which would make for too-complicated SQL, and you'd still
need to intercept this select() using events to put the right shard id
in.  Another is to create a custom column that renders in a special
way, but again you need to create events to intercept it in every case
to put the right shard id in, and/or remove it from things like
insert() statements.

by far your two best solutions are: 1. use separate Session objects
per shard  2. make sure your data actually has shard-specific primary
keys







>
> def instances(self, cursor, __context=None):
> context = __context
> if context is None:
> context = QueryContext(self)
> return self._custom_instances(self, cursor, context)
>
>
>
> def custom_instances(query, cursor, context):
>  """copied from `loading.instances` code with disabled
> `session_identity_map`"""
>
>
>
> The third way is the most ugly and I want to avoid it.
>
> Could you help me with my hard choice and, maybe, suggest any other ways and
> options? :)
>
> Thank you.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails f