Re: [sqlalchemy] Grouping related Columns in table definitions and Python object reconstruction

2019-07-03 Thread Mike Bayer


On Wed, Jul 3, 2019, at 4:47 AM, Simon King wrote:
> On Wed, Jul 3, 2019 at 3:46 AM gamcil  wrote:
> >
> >
> > Awesome thanks for the extensive reply.
> >
> > This is the first time I've played with descriptor classes, so I'm sorry if 
> > this is really basic stuff.
> >
> > I set up my view class:
> >
> > class View:
> > def __init__(self, gene, attrs):
> > self.gene = gene
> > self.attrs = attrs
> >
> > def keys(self):
> > return iter(self.attrs)
> >
> > def items(self):
> > return ((attr, getattr(self.gene, attr)) for attr in self.attrs)
> >
> > def __getitem__(self, key):
> > return getattr(self.gene, key)
> >
> > def __setitem__(self, key, value):
> > setattr(self.gene, key, value)
> >
> > and a descriptor class:
> > class Descriptor:
> > def __init__(self, *attrs):
> > self.attrs = attrs
> >
> > def __get__(self, instance, owner):
> > return View(instance, self.attrs)
> >
> > def __set__(self, instance, value):
> > self.attrs = value
> >
> > then the Gene class using __declare_first__ becomes:
> > class Gene(Base):
> > __tablename__ = 'Gene'
> > id = Column(Integer, primary_key=True)
> > identifiers = Descriptor('locus', 'protein')
> > features = Descriptor('gene', 'mRNA', 'CDS')
> >
> > def __init__(self, **kwargs):
> > for dic in kwargs.values():
> > for key, value in dic.items():
> > setattr(self, key, value)
> >
> > @classmethod
> > def __declare_first__(cls):
> > for value in list(cls.__dict__.values()).copy():
> > if isinstance(value, Descriptor):
> > for attr in value.attrs:
> > setattr(cls, key, Column(attr, String))
> >
> > This seems to give me close to the interface I would like. A couple of 
> > questions:
> > 1) So I guess this is still just directly mapping columns to Class 
> > attributes, accessable by directly getting an attribute, i.e. 
> > gene.features['mRNA'] is equivalent to gene.mRNA. Would you then mask the 
> > class attributes by e.g. prepending with underscore?
> 
> That's entirely up to you. It's not *necessary* (no harm will be done
> if you access the data through both mechanisms, since the value is
> only stored in a single place), so it's just an aesthetic choice.
> 
> >
> > 2) Currently, every time a method is called on the descriptor, a new 
> > instance of the View class is returned. Is that by design, or is there a 
> > way of persisting the one View per attribute (identifiers, features, ...) 
> > for each Gene instance? Is that even an overhead I should be worried about? 
> > It seems like the equivalent of creating a new dictionary every time I want 
> > to access attributes on the Gene object.
> 
> This is probably a case of "don't bother optimising until you know
> it's a problem", but in this case the optimisation is simple. You
> could change your Descriptor.__get__ method to store the View instance
> on the object itself, perhaps under a name derived from the list of
> attributes. For example:
> 
> def __get__(self, instance, owner):
>  cachename = '_descriptor_' + '_'.join(self.attrs)
>  result = getattr(instance, cachename, None)
>  if result is None:
>  result = View(instance, self.attrs)
>  setattr(instance, cachename, result)
>  return result
> 
> >
> > 3) When I call Base.metadata.create_all(bind=engine) without first creating 
> > a Gene instance, the generated SQL is:
> > CREATE TABLE "gene" (
> > id INTEGER NOT NULL,
> > PRIMARY KEY (id)
> > )
> >
> > missing all of the other mappings. After calling e.g. gene = Gene(), it's 
> > as expected:
> > CREATE TABLE "gene" (
> > id INTEGER NOT NULL,
> > locus VARCHAR,
> > protein VARCHAR,
> > gene VARCHAR,
> > "mRNA" VARCHAR,
> > "CDS" VARCHAR,
> > PRIMARY KEY (id)
> > )
> >
> 
> This seems like a bug; I would have expected the __declare_first__
> method to be called automatically when you call create_all.
> 
> As a workaround, does it make any difference if you add a call to
> sqlalchemy.orm.configure_mappers() before your call to create_all?

oh, whoops sorry it does not :) just call configure_mappers() :)


> 
> Simon
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CAFHwexdP95%2BWy5-8OjqhoQcYcs8jRT9RFp-G-SCY9fwwO5UoHg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
> 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please 

Re: [sqlalchemy] Grouping related Columns in table definitions and Python object reconstruction

2019-07-03 Thread Simon King
On Wed, Jul 3, 2019 at 3:46 AM gamcil  wrote:
>
>
> Awesome thanks for the extensive reply.
>
> This is the first time I've played with descriptor classes, so I'm sorry if 
> this is really basic stuff.
>
> I set up my view class:
>
> class View:
> def __init__(self, gene, attrs):
> self.gene = gene
> self.attrs = attrs
>
> def keys(self):
> return iter(self.attrs)
>
> def items(self):
> return ((attr, getattr(self.gene, attr)) for attr in self.attrs)
>
> def __getitem__(self, key):
> return getattr(self.gene, key)
>
> def __setitem__(self, key, value):
> setattr(self.gene, key, value)
>
> and a descriptor class:
> class Descriptor:
> def __init__(self, *attrs):
> self.attrs = attrs
>
> def __get__(self, instance, owner):
> return View(instance, self.attrs)
>
> def __set__(self, instance, value):
> self.attrs = value
>
> then the Gene class using __declare_first__ becomes:
> class Gene(Base):
> __tablename__ = 'Gene'
> id = Column(Integer, primary_key=True)
> identifiers = Descriptor('locus', 'protein')
> features = Descriptor('gene', 'mRNA', 'CDS')
>
> def __init__(self, **kwargs):
> for dic in kwargs.values():
> for key, value in dic.items():
> setattr(self, key, value)
>
> @classmethod
> def __declare_first__(cls):
> for value in list(cls.__dict__.values()).copy():
> if isinstance(value, Descriptor):
> for attr in value.attrs:
> setattr(cls, key, Column(attr, String))
>
> This seems to give me close to the interface I would like. A couple of 
> questions:
> 1) So I guess this is still just directly mapping columns to Class 
> attributes, accessable by directly getting an attribute, i.e. 
> gene.features['mRNA'] is equivalent to gene.mRNA. Would you then mask the 
> class attributes by e.g. prepending with underscore?

That's entirely up to you. It's not *necessary* (no harm will be done
if you access the data through both mechanisms, since the value is
only stored in a single place), so it's just an aesthetic choice.

>
> 2) Currently, every time a method is called on the descriptor, a new instance 
> of the View class is returned. Is that by design, or is there a way of 
> persisting the one View per attribute (identifiers, features, ...) for each 
> Gene instance? Is that even an overhead I should be worried about? It seems 
> like the equivalent of creating a new dictionary every time I want to access 
> attributes on the Gene object.

This is probably a case of "don't bother optimising until you know
it's a problem", but in this case the optimisation is simple. You
could change your Descriptor.__get__ method to store the View instance
on the object itself, perhaps under a name derived from the list of
attributes. For example:

def __get__(self, instance, owner):
cachename = '_descriptor_' + '_'.join(self.attrs)
result = getattr(instance, cachename, None)
if result is None:
result = View(instance, self.attrs)
setattr(instance, cachename, result)
return result

>
> 3) When I call Base.metadata.create_all(bind=engine) without first creating a 
> Gene instance, the generated SQL is:
> CREATE TABLE "gene" (
> id INTEGER NOT NULL,
> PRIMARY KEY (id)
> )
>
> missing all of the other mappings. After calling e.g. gene = Gene(), it's as 
> expected:
> CREATE TABLE "gene" (
> id INTEGER NOT NULL,
> locus VARCHAR,
> protein VARCHAR,
> gene VARCHAR,
> "mRNA" VARCHAR,
> "CDS" VARCHAR,
> PRIMARY KEY (id)
> )
>

This seems like a bug; I would have expected the __declare_first__
method to be called automatically when you call create_all.

As a workaround, does it make any difference if you add a call to
sqlalchemy.orm.configure_mappers() before your call to create_all?

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CAFHwexdP95%2BWy5-8OjqhoQcYcs8jRT9RFp-G-SCY9fwwO5UoHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Grouping related Columns in table definitions and Python object reconstruction

2019-07-02 Thread gamcil

Awesome thanks for the extensive reply.

This is the first time I've played with descriptor classes, so I'm sorry if 
this is really basic stuff.

I set up my view class:

class View:
def __init__(self, gene, attrs):
self.gene = gene
self.attrs = attrs

def keys(self):
return iter(self.attrs)

def items(self):
return ((attr, getattr(self.gene, attr)) for attr in self.attrs)

def __getitem__(self, key):
return getattr(self.gene, key)

def __setitem__(self, key, value):
setattr(self.gene, key, value)

and a descriptor class:
class Descriptor:
def __init__(self, *attrs):
self.attrs = attrs

def __get__(self, instance, owner):
return View(instance, self.attrs)

def __set__(self, instance, value):
self.attrs = value

then the Gene class using __declare_first__ becomes:
class Gene(Base):
__tablename__ = 'Gene'
id = Column(Integer, primary_key=True)
identifiers = Descriptor('locus', 'protein')
features = Descriptor('gene', 'mRNA', 'CDS')

def __init__(self, **kwargs):
for dic in kwargs.values():
for key, value in dic.items():
setattr(self, key, value)

@classmethod
def __declare_first__(cls):
for value in list(cls.__dict__.values()).copy():
if isinstance(value, Descriptor):
for attr in value.attrs:
setattr(cls, key, Column(attr, String))

This seems to give me close to the interface I would like. A couple of 
questions:
1) So I guess this is still just directly mapping columns to Class 
attributes, accessable by directly getting an attribute, i.e. 
gene.features['mRNA'] is equivalent to gene.mRNA. Would you then mask the 
class attributes by e.g. prepending with underscore?

2) Currently, every time a method is called on the descriptor, a new 
instance of the View class is returned. Is that by design, or is there a 
way of persisting the one View per attribute (identifiers, features, ...) 
for each Gene instance? Is that even an overhead I should be worried about? 
It seems like the equivalent of creating a new dictionary every time I want 
to access attributes on the Gene object.

3) When I call Base.metadata.create_all(bind=engine) without first creating 
a Gene instance, the generated SQL is:
CREATE TABLE "gene" (
id INTEGER NOT NULL,
PRIMARY KEY (id)
)

missing all of the other mappings. After calling e.g. gene = Gene(), it's 
as expected:
CREATE TABLE "gene" (
id INTEGER NOT NULL,
locus VARCHAR,
protein VARCHAR,
gene VARCHAR,
"mRNA" VARCHAR,
"CDS" VARCHAR,
PRIMARY KEY (id)
)

When I'm first initialising the database, when should I be calling 
create_all? Do I need to actually create objects before I create the schema 
and initialise the database?

Thanks again


On Monday, 1 July 2019 23:25:38 UTC+8, Mike Bayer wrote:
>
>
>
> On Mon, Jul 1, 2019, at 5:56 AM, gamcil wrote:
>
> Hi,
>
> I'm new to SQLAlchemy/ORMs - previously I had just been interacting 
> directly with the builtin SQLite driver and had built up my own mess of a 
> mapping system.
>
> 1) I'm starting to convert some classes in a Python package to SQLAlchemy 
> models. . A paired down example of what I'm doing:
>
> class Gene:
> def __init__(self, identifiers, features):
> self.identifiers = identifiers
> self.features = features
>
> gene = Gene(identifiers={'locus': 'GENE_0001', 'protein_id': 'PROT_0001'},
> features={'mRNA': '1..300,400..500', 'CDS': '1..300,400..500'})
>
> In reality, I'm dealing with many attributes that are all related to the 
> Gene and stored in the same table, which is why to simplify the interface 
> of my classes, I grouped related instance attributes in dictionaries. For 
> example, the location of the mRNA feature of this Gene can then be accessed 
> by gene.features['mRNA']. Each value in the dictionary refers to a unique 
> database column.
>  
> However, when creating SQLAlchemy models, it's required to explicitly map 
> Columns directly as class attributes, e.g.:
> class Gene:
> id = Column(Integer, primary_key=True)
> locus = Column(String)
> protein_id = Column(String)
> mRNA = Column(String)
> CDS = Column(String)
>
> Is there a simple way to provide the dictionary functionality? Something 
> like:
> class Gene:
> id = Column(Integer, primary_key=True)
> identifiers = {
> 'locus': Column(String),
> 'protein': Column(String)
> }
> ...
> Where a change in Gene.identifiers['locus'] would then result in an update 
> to the 'locus' column. I saw MutableDict in the documentation but it seems 
> non-trivial to implement for something like this.
> Is this even a good idea?
>
>
> I'm going to focus on #1 here, so, this kind of thing is pretty 
> straightforward in Python, throughout SQLAlchemy's examples / extensions 
> there are all kinds of "magic 

Re: [sqlalchemy] Grouping related Columns in table definitions and Python object reconstruction

2019-07-01 Thread Mike Bayer


On Mon, Jul 1, 2019, at 5:56 AM, gamcil wrote:
> Hi,
> 
> I'm new to SQLAlchemy/ORMs - previously I had just been interacting directly 
> with the builtin SQLite driver and had built up my own mess of a mapping 
> system.
> 
> 1) I'm starting to convert some classes in a Python package to SQLAlchemy 
> models. . A paired down example of what I'm doing:
> 
> class Gene:
> def __init__(self, identifiers, features):
> self.identifiers = identifiers
> ` self.features = features``
> 
> gene = Gene(identifiers={'locus': 'GENE_0001', 'protein_id': 'PROT_0001'},
>  features={'mRNA': '1..300,400..500', 'CDS': '1..300,400..500'})`
> 
> In reality, I'm dealing with many attributes that are all related to the Gene 
> and stored in the same table, which is why to simplify the interface of my 
> classes, I grouped related instance attributes in dictionaries. For example, 
> the location of the mRNA feature of this Gene can then be accessed by 
> gene.features['mRNA']. Each value in the dictionary refers to a unique 
> database column.
> 
> However, when creating SQLAlchemy models, it's required to explicitly map 
> Columns directly as class attributes, e.g.:
> `class Gene:
>  id = Column(Integer, primary_key=True)
>  locus = Column(String)`
> ` protein_id = Column(String)
>  mRNA = Column(String)
>  CDS = Column(String)`
> 
> Is there a simple way to provide the dictionary functionality? Something like:
> `class Gene:
>  id = Column(Integer, primary_key=True)
>  identifiers = {
>  'locus': Column(String),`
> ` 'protein': Column(String)
>  }
>  ...```
> Where a change in Gene.identifiers['locus'] would then result in an update to 
> the 'locus' column. I saw MutableDict in the documentation but it seems 
> non-trivial to implement for something like this.
> Is this even a good idea?

I'm going to focus on #1 here, so, this kind of thing is pretty straightforward 
in Python, throughout SQLAlchemy's examples / extensions there are all kinds of 
"magic dictionary" kinds of examples going on, and this one is...another one :) 
To give a high level for this kind of thing, as far as setting up mappings / 
tables, that can always be done "magically", that is, you can write event hooks 
to read any kind of configuration you want that will set up the mapper() / 
Table with whatever columns are needed. For the "get / set" a dictionary part, 
we always use a Python descriptor for that kind of thing, so in this case I'd 
likely be building a custom kind of descriptor class that also serves as the 
information to generate the table definition. Finally, you want the dictionary 
to be "live", that is, not get /set of the whole dictionary, you want changes 
in the dictionary to affect the model. So for the last part, yes there needs to 
be a custom dictionary object of some kind. It would use the same Python 
techniques as MutableDict but it likely would be specific to the case here, and 
overall this case is not that "hard". In this case I would create a Python 
object that is not in fact a "dict", but just an object that has as dict-like 
interface, e.g. __getitem__(), __setitem__(), keys(), etc., and then it just 
uses the object that it comes from as the source of data, like:


class GeneView:
 def __init__(self, gene, attrs):
 self.gene = gene
 self.attrs = attrs

 def keys(self):
 return iter(self.attrs)

 def items(self):
 return ((attr, getattr(self.parent, attr)) for attr in self.attrs)

 def __getitem__(self, key):
 return getattr(self.parent, key) # maybe you want to check self.attrs too

 def __setitem__(self, key, value):
 setattr(self.parent, key, value) # mutability


The above thing comes from a descriptor, a hardcoded version would look like:

class Gene(...):
 @property
 def identifiers(self):
 return GeneView(self, ['locus', 'protein'])


I'd then want to make the above "hardcoded" pattern more generic making a 
custom descriptor class 
(https://docs.python.org/3/howto/descriptor.html#descriptor-example) so that 
the mapping looks like:


class Gene(...):
 identifiers = gene_view_attr('locus', 'protein')


to map it, a simple event hook like before_mapper_configured (or declararive's 
declare_first / declare_last hooks: 
https://docs.sqlalchemy.org/en/13/orm/extensions/declarative/api.html?highlight=declare_last#declare-last)
 can be used, in this case it probably doesn't matter where in the mapping 
phase we are since columns can be added at any time:

from sqlalchemy import event

@event.listens_for(Base, "before_mapper_configured", propagate=True)
def setup_view_attrs(mapper, class_):
 for value in class_.__dict__.values():
 if isinstance(value, gene_view_attr): # look for our descriptor
 for attr in value.attrs:
 setattr(class_, attr, Column(attr, String)) # map columns


That's all you need for #1, let me know if you need more detail.




> 
> 2) Say for the example above I have created a plain Python Location class to 
> represent a location of a feature in self.features:
> class Location:
>  def 

[sqlalchemy] Grouping related Columns in table definitions and Python object reconstruction

2019-07-01 Thread gamcil
Hi,

I'm new to SQLAlchemy/ORMs - previously I had just been interacting 
directly with the builtin SQLite driver and had built up my own mess of a 
mapping system.

1) I'm starting to convert some classes in a Python package to SQLAlchemy 
models. . A paired down example of what I'm doing:

class Gene:
def __init__(self, identifiers, features):
self.identifiers = identifiers
self.features = features

gene = Gene(identifiers={'locus': 'GENE_0001', 'protein_id': 'PROT_0001'},
features={'mRNA': '1..300,400..500', 'CDS': '1..300,400..500'})

In reality, I'm dealing with many attributes that are all related to the 
Gene and stored in the same table, which is why to simplify the interface 
of my classes, I grouped related instance attributes in dictionaries. For 
example, the location of the mRNA feature of this Gene can then be accessed 
by gene.features['mRNA']. Each value in the dictionary refers to a unique 
database column.
 
However, when creating SQLAlchemy models, it's required to explicitly map 
Columns directly as class attributes, e.g.:
class Gene:
id = Column(Integer, primary_key=True)
locus = Column(String)
protein_id = Column(String)
mRNA = Column(String)
CDS = Column(String)

Is there a simple way to provide the dictionary functionality? Something 
like:
class Gene:
id = Column(Integer, primary_key=True)
identifiers = {
'locus': Column(String),
'protein': Column(String)
}
...
Where a change in Gene.identifiers['locus'] would then result in an update 
to the 'locus' column. I saw MutableDict in the documentation but it seems 
non-trivial to implement for something like this.
Is this even a good idea?

2) Say for the example above I have created a plain Python Location class 
to represent a location of a feature in self.features:
class Location:
def __init__(self, intervals, ...):
self.intervals = intervals
...

@classmethod
def from_flat(cls, flat):
# convert flat location to list of tuples
# e.g. [(1, 300), (400, 500)]

This class has a classmethod that takes the stored flat string (e.g. 
1..300,400..500) and converts it to a list of interval tuples, and has some 
extra logic in it for use in my applications. I understand I can use 
orm.reconstructor to instantiate these Location objects from the stored 
attributes upon object loading, i.e.
@reconstructor
def init_on_load(self):
self.mRNA = Location.from_flat(self.mRNA)
...

However, what I don't get is how I should be shuttling changes in these 
objects back to the database as updates.
Is there some way to map this (instantiate Location on loading; mutate 
Location; convert Location back to flat format to update the Gene 
attribute)? Do i need to write setter methods for each that directly set 
the underlying column attribute to the flat representation of the Location?

Any help is appreciated, cheers

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/f3105bee-b6dc-489c-82e0-c6d36710a462%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.