On Thu, Feb 28, 2019 at 1:34 PM Andrew Martin wrote:
>
> Hi All,
>
> Trying something out here for an analytics app here, and I'm not sure any of
> this is a good idea. I'm trying to create a data model that's flexible enough
> to handle a lot of different types of analysis. From plain CSVs to time
> series to survey questions with complex hierarchies. My approach is to have a
> pretty plain model and query as needed to construct dataframes that will end
> up in pandas and sci-kit learn.
>
> The basic idea is that there's a dataset that has variables associated with
> it. Variables have values associated. I want to retain type information on
> the values. If I'm reading a file and loading it, I want to store an int as
> an int or a datetime as a datetime and not stuff everything into a string and
> have to guess what things are based on variable metadata. That's where I'm
> kind of stuck. I think I have the multi-table inheritance set up correctly to
> store different types of value. I'm totally unsure how to query without
> having to know the subclass. I'd like to be able to extract values for pandas
> processing with something like this:
>
> vals = Values.value.filter_by(variable_id=123456).all()
>
> without having to know per variable which subclass to choose. I.e., I don't
> want to have to do
> vals = IntValue.value.filter_by(variable_id=123456).all()
>
so "Values.value.filter_by(variable_id=123456).all()" is not
meaningful to me because I don't know what "Values.value" is, is that
a dynamic relationship hence you're getting a Query ? What SQL do
you see this rendering?
>
> The second part of my question is only relevant if this whole design isn't
> hosed, which I'm fine if you tell me that it is and I need to go back to the
> drawing board.
>
> The second part is how to extract an entire dataset by ID in a way that would
> be comfortable in pandas. If I were doing this in raw SQL, I'd use a CTE to
> get the variables by dataset id and use a pivot to create the
> data-frame-shaped table. I'm confounded by how to do this in alchemy,
> especially with the polymorphic setup.
can you show me that ?
FTR we usually reduce the JOINs by putting all the "value" columns in
one table but otherwise similar approach, see:
https://docs.sqlalchemy.org/en/latest/_modules/examples/vertical/dictlike-polymorphic.html
>
>
> Here are my current model definitions:
>
>
> from sqlalchemy import Column, Index, Integer, Text, DateTime, Float,
> ForeignKey
> from sqlalchemy.orm import relationship
>
> from .meta import Base
>
>
> class DataSet(Base):
> __tablename__ = 'datasets'
> id = Column(Integer, primary_key=True)
> name = Column(Text, nullable=False)
> description = Column(Text, nullable=True)
>
>
>
> class Variable(Base):
> __tablename__ = 'variables'
> id = Column(Integer, primary_key=True)
> dataset_id = Column(Integer, ForeignKey('datasets.id'), Nullable=False)
> name = Column(Text, nullable=False)
> description = Column(Text, nullable=True)
> group_var_col_id = Column(Integer, ForeignKey('variables.id'),
> nullable=True)
> group_var_row_id = Column(Integer, ForeignKey('variables.id'),
> nullable=True)
>
> value_map = Column(Text, nullable=True) #change to JSONB when move from
> SQLite to Postgres
>
> dataset = relationship('DataSet', backref='variables')
>
>
> class Value(Base):
> __tablename__ = 'values'
> id = Column(Integer, primary_key=True)
> variable_id = Column(Integer, ForeignKey('variables.id'), Nullable=False)
> observation_id = Column(Text, nullable=False)
> value_type = Column(Text, nullable=False)
>
> variable = relationship('Variable', backref='values')
>
> __mapper_args__ = {'polymorphic_on': value_type}
>
>
> class IntValue(Value):
> __tablename__ = 'int_values'
> id = Column(None, ForeignKey('values.id'), primary_key=True)
> value = Column(Integer, nullable=True)
>
> __mapper_args__ = {'polymorphic_identity': 'int'}
>
>
> class StringValue(Value):
> __tablename__ = 'string_values'
> id = Column(None, ForeignKey('values.id'), primary_key=True)
> value = Column(Text, nullable=True)
>
> __mapper_args__ = {'polymorphic_identity': 'string'}
>
>
> class FloatValue(Value):
> __tablename__ = 'float_values'
> id = Column(None, ForeignKey('values.id'), primary_key=True)
> value = Column(Float, nullable=True)
>
> __mapper_args__ = {'polymorphic_identity': 'float'}
>
>
> class DateTimeValue(Value):
> __tablename__ = 'datetime_values'
> id = Column(None, ForeignKey('values.id'), primary_key=True)
> value = Column(DateTime, nullable=True)
>
> __mapper_args__ = {'polymorphic_identity': 'datetime'}
>
> Thanks in advance for any suggestions you might have!
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
>