subject:"\[sqlalchemy\] Mapping lots of similar tables \/ database design"

[sqlalchemy] Mapping lots of similar tables / database design

2014-04-01 Thread Peter Stensmyr



I posted this question on Stack Overflow a few days ago, and got some 
response but nothing that really solves my problem. I'm hoping that I can 
get some more input here. The initial recommendation was to keep all the 
data in two tables (one meta and one data table), but this might become 
unwieldy with the number of rows (I estimate it will be about 6 billion 
rows in total).

I have many (~2000) locations with time series data. Each time series has 
millions of rows. I would like to store these in a Postgres database. My 
current approach is to have a table for each location time series, and a 
meta table which stores information about each location (coordinates, 
elevation etc). I am using SQLAlchemy to create and query the tables. I 
would like to have a relationship between the meta table and each time 
series table to do queries like select all locations that have data 
between date A and date B and select all data for date A and export a csv 
with coordinates. What is the best way to create many tables with the same 
structure (only the name is different) and have a relationship with a meta 
table? Or should I use a different database design?

Currently I am using this type of approach to generate a lot of similar 
mappings:

from sqlalchemy import create_engine, MetaDatafrom sqlalchemy.types import 
Float, String, DateTime, Integerfrom sqlalchemy import Column, ForeignKeyfrom 
sqlalchemy.ext.declarative import declarative_basefrom sqlalchemy.orm import 
sessionmaker, relationship, backref
Base = declarative_base()

def make_timeseries(name):
class TimeSeries(Base):

__tablename__ = name
table_name = Column(String(50), ForeignKey('locations.table_name'))
datetime = Column(DateTime, primary_key=True)
value = Column(Float)

location = relationship('Location', backref=backref('timeseries',
lazy='dynamic'))

def __init__(self, table_name, datetime, value):
self.table_name = table_name
self.datetime = datetime
self.value = value

def __repr__(self):
return {}: {}.format(self.datetime, self.value)

return TimeSeries

class Location(Base):

__tablename__ = 'locations'
id = Column(Integer, primary_key=True)
table_name = Column(String(50), unique=True)
lon = Column(Float)
lat = Column(Float)
if __name__ == '__main__':
connection_string = 'postgresql://user:pw@localhost/location_test'
engine = create_engine(connection_string)
metadata = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()

TS1 = make_timeseries('ts1')
# TS2 = make_timeseries('ts2')   # this breaks because of the foreign key
Base.metadata.create_all(engine)
session.add(TS1(ts1, 2001-01-01, 999))
session.add(TS1(ts1, 2001-01-02, -555))

qs = session.query(Location).first()
print qs.timeseries.all()


This approach has some problems, most notably that if I create more than 
one TimeSeries object the foreign key doesn't work. Previously I've used 
some work-arounds (such as not defining a foreign key), but it all seems 
like a big hack and I feel that there must be a better way of doing this. 
How should I organise and access my data?

Any advice appreciated, 

Peter

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Mapping lots of similar tables / database design

2014-04-01 Thread Michael Bayer


On Apr 1, 2014, at 6:56 PM, Peter Stensmyr peter.stens...@gmail.com wrote:

 I posted this question on Stack Overflow a few days ago, and got some 
 response but nothing that really solves my problem. I'm hoping that I can get 
 some more input here. The initial recommendation was to keep all the data in 
 two tables (one meta and one data table), but this might become unwieldy with 
 the number of rows (I estimate it will be about 6 billion rows in total).
 
 
 I have many (~2000) locations with time series data. Each time series has 
 millions of rows. I would like to store these in a Postgres database. My 
 current approach is to have a table for each location time series, and a meta 
 table which stores information about each location (coordinates, elevation 
 etc). I am using SQLAlchemy to create and query the tables. I would like to 
 have a relationship between the meta table and each time series table to do 
 queries like select all locations that have data between date A and date B 
 and select all data for date A and export a csv with coordinates. What is 
 the best way to create many tables with the same structure (only the name is 
 different) and have a relationship with a meta table? Or should I use a 
 different database design?
 
 This approach has some problems, most notably that if I create more than one 
 TimeSeries object the foreign key doesn't work. Previously I've used some 
 work-arounds (such as not defining a foreign key), but it all seems like a 
 big hack and I feel that there must be a better way of doing this. How should 
 I organise and access my data?
 
 

so the idea of making lots of tables and using a function to generate 
declarative classes with different table names, that's not unheard of.  Such an 
approach is the basis for recipes like the entity name recipe which you can 
see at https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName.

Then the idea of each of your TimeSeries reaching out to Location, and having 
TimeSeries.location, also fine.

But then the whole thing goes horribly wrong when you try to claim that there 
will be a timeseries backref on Location.

Think about it.  You have 2000 individual Python classes all called TimeSeries, 
all referring to a different table.  This is the question you have to answer 
(and which I think is going to wind you up back at one table):

1. which one of those 2000 TimeSeries classes do I refer to when I say 
Location.timeseries ?

2. what SQL would a query like query(Location).join(timeseries) produce?  Are 
you looking for an enormous UNION of 2000 tables?  that's not going to work.

Think about this in terms of SQL.   If you want to query across *all* 
timeseries at once, having them all in 2000 tables is not going to be possible. 
 If you did have that setup, the approach would be to write a map/reduce query 
that hits each TimeSeries table separately and figures it out at that level, 
just like if you were running a noSQL database like CouchDB or something like 
that.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Mapping lots of similar tables / database design

2014-04-01 Thread Claudio Freire

On Tue, Apr 1, 2014 at 8:07 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 Think about it.  You have 2000 individual Python classes all called
 TimeSeries, all referring to a different table.  This is the question you
 have to answer (and which I think is going to wind you up back at one
 table):

 1. which one of those 2000 TimeSeries classes do I refer to when I say
 Location.timeseries ?

 2. what SQL would a query like query(Location).join(timeseries) produce?
 Are you looking for an enormous UNION of 2000 tables?  that's not going to
 work.

 Think about this in terms of SQL.   If you want to query across *all*
 timeseries at once, having them all in 2000 tables is not going to be
 possible.  If you did have that setup, the approach would be to write a
 map/reduce query that hits each TimeSeries table separately and figures it
 out at that level, just like if you were running a noSQL database like
 CouchDB or something like that.


Thing is, this is not relational data.

I believe you need a columnar data store.

Which, I guess it's up to you to choose - I don't know many opensource ones.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Mapping lots of similar tables / database design

Re: [sqlalchemy] Mapping lots of similar tables / database design

Re: [sqlalchemy] Mapping lots of similar tables / database design

3 matches

Site Navigation

Mail list logo

Footer information