[sqlalchemy] Re: Using operators with multiple InstrumentedAttribute instances
The code that makes up the query builder I've been working on is pretty extensive, so I'll go through the high-level basics. I've set up a `_base` method that augments the SQLALchemy Base model. Note the column details retrived in `get_column_and_json_key_from_sql_name` class method: class _base: """ This class augments the default SQLAlchemy Base model """ @classmethod def get_column_and_json_key_from_sql_name(cls, name): """ Returns the column and, if applicable, the JSON top-level key from the JSON dict. :param name: Name of field. :return: """ assert type(name) == str json_key = None col_names = name.split(COLUMN_NAME_DELIMITER) if len(col_names) == 2: name, json_key = col_names try: col = cls.__table__.c[name] except KeyError: log.error("Invalid column name: %s", name) return None return (getattr(cls, col.name), json_key) We then have a `_build_column` method that essentially is responsible for retrieving the details necessary to construct the column for the select statement necessary for constructing the SQLAlchemy ORM query: def _build_column(self): field_name = self.db.get("column") model = self._object.get("model") column_type, column_key = self.db.get("type"), self.db.get("key") select_column, json_key = model.get_column_and_json_key_from_sql_name( field_name ) select_column = self._construct_json_select_field( column_type, select_column, json_key, column_key ) return select_column What I'm trying to figure out is how to dynamically generate SQLAlchemy ORM statements based on the formula. The formulas can be any math equation using +, -, /, *, and parentheses: formula1 = '"metric:1" + "metric:2" + "metric:3"' formula2 = '"metric:1" + "metric:2" - "metric:3"' formula3 = '"metric:1" + ("metric:2" * "metric:3")' formula4 = '"metric:1" / "metric:2"' formula5 = '"metric:1 / ("metric:2" * "metric:3")' The InstrumentedAttribute objects I mentioned earlier are the select fields for each individual metric - what I need to figure out how to do is to be able to build a SQLAlchemy ORM query by parsing the formula string, and evaluating each operation in the context of the query. `func.sum` would work for formula1, but I'd need to iteratively build the formula with SQLAlchemy ORM helper methods for formula2 - formula5. Per +Jonathan Vanasco's last comment, I've already figured out Phase 1. I'm stuck on Phase 2. On Thursday, March 28, 2019 at 4:43:56 PM UTC-4, Ian Miller wrote: > > Hello all, > > I am in the process of trying to create a dynamic expression query engine > in an application I'm working on. > > So there is a formula that gets defined like so: > > formula = '"metric:123" + "metric:456" + "metric:789"' > > Each metric maps to a column in the database tables - long story short, > I'm able to retrieve the metric by ID, and instantiate an > InstrumentedAttribute object that has the SQLAlchemy metadata for the > associated column. What I'm trying to achieve is to be able to iterate > through the formula, and dynamically build a SQLALchemy query that maps to > the formula. > > For example, the formula defined above would look something like this in > SQL: > > SELECT post.id + campaign.id + asset.id > FROM post, campaign, asset > WHERE ..; > > The idea is to translate the above to something like: > > session.query( 0x7ff9269f92b0> + at 0x7ff9269c5990> + object at 0x7ff926896048>).all() > > I've tried a couple of approaches of dynamically generating the SQLAlchemy > ORM query, but I haven't been able to find anything that works. Would > anyone have any idea or tips on how to accomplish this? > > Thank you! > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Using operators with multiple InstrumentedAttribute instances
Hello all, I am in the process of trying to create a dynamic expression query engine in an application I'm working on. So there is a formula that gets defined like so: formula = '"metric:123" + "metric:456" + "metric:789"' Each metric maps to a column in the database tables - long story short, I'm able to retrieve the metric by ID, and instantiate an InstrumentedAttribute object that has the SQLAlchemy metadata for the associated column. What I'm trying to achieve is to be able to iterate through the formula, and dynamically build a SQLALchemy query that maps to the formula. For example, the formula defined above would look something like this in SQL: SELECT post.id + campaign.id + asset.id FROM post, campaign, asset WHERE ..; The idea is to translate the above to something like: session.query( + + ).all() I've tried a couple of approaches of dynamically generating the SQLAlchemy ORM query, but I haven't been able to find anything that works. Would anyone have any idea or tips on how to accomplish this? Thank you! -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query
:facepalm: i should have seen that much sooner. Thank you so much for your help. No longer need to do any of that crazy stuff I attempted! On Thursday, February 7, 2019 at 5:13:41 PM UTC-5, Mike Bayer wrote: > > On Thu, Feb 7, 2019 at 2:49 PM Ian Miller > wrote: > > > > The reason why I iterated over .c._all_columns was because it shows the > references to the 2 "metadata_value" columns, whereas selectable.c only > references 1. > > > so that's a sign that you need to apply labels to these columns > because they are conflicting. with select(), call > select().apply_labels() so that these two columns are prefixed with > their table name in a label. > > > > > > For example, selectable.c shows the following: > > > > (Pdb++) list(query.c) > > [Column('metadata_value', VARCHAR(length=255), > table=), Column('created_at', TIMESTAMP(), > table=), 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... > > > > selectable.c._all_columns shows the following: > > > > (Pdb++) query.c._all_columns > > [Column('metadata_value', VARCHAR(length=255), > table=), Column('metadata_value', VARCHAR(length=255), > table=), Column('created_at', TIMESTAMP(), > table=), 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... > > > > > > > > > > On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote: > >> > >> just as a note, don't access .c._all_columns, just iterate over > selectable.c > >> > >> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller wrote: > >> > > >> > Hello Mike, > >> > > >> > Thank you for your response! I have currently constructed the ORM > implementation that you suggested in your response. Here's the code: > >> > > >> > def _rebuild_non_interval_query_for_group_by(self, session, query): > >> > from sqlalchemy import table, column, select > >> > from sqlalchemy.orm import aliased > >> > from collections import defaultdict, OrderedDict > >> > > >> > post_metadata = table( > >> > "post_metadata", column("post_id"), column("metadata_value") > >> > ) > >> > campaign_metadata = table( > >> > "campaign_metadata", column("campaign_id"), > column("metadata_value") > >> > ) > >> > asset_metadata = table( > >> > "asset_metadata", column("asset_id"), > column("metadata_value") > >> > ) > >> > > >> > vw_asset = table("vw_asset", column("id")) > >> > vw_campaign = table("vw_campaign", column("id")) > >> > vw_post = table("vw_post", column("id")) > >> > > >> > METADATA_PRIMARY_TABLE_MAP = { > >> > asset_metadata.name: vw_asset, > >> > campaign_metadata.name: vw_campaign, > >> > post_metadata.name: vw_post, > >> > } > >> > > >> > METADATA_NAME_TABLE_MAP = { > >> > asset_metadata.name: asset_metadata, > >> > campaign_metadata.name: campaign_metadata, > >> > post_metadata.name: post_metadata, > >> > } > >> > > >> > primary_tables = set() > >> > metadata_columns_count = defaultdict(int) > >> > metadata_alias = OrderedDict() > >> > columns = [] > >> > for c in query.c._all_columns: > >> > if c.name == "metadata_value": > >> > parent_column = list(c.base_columns)[0] > >> > table = parent_column.table > >> > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) > > >> > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > += 1 > >> > alias_number = > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > >> > alias_name = "{}_{}".format(table.name, alias_number) > >> > alias = aliased(parent_column.table, alias_name) > >> > metadata_alias[alias_name] = alias > >> > column = alias.c.metadata_value.label( > >> > "{}_{}_{}".format(table.name, alias_number, > "metadata_value&
Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query
The reason why I iterated over .c._all_columns was because it shows the references to the 2 "metadata_value" columns, whereas selectable.c only references 1. For example, selectable.c shows the following: (Pdb++) list(query.c) [Column('metadata_value', VARCHAR(length=255), table=), Column('created_at', TIMESTAMP(), table=), ] selectable.c._all_columns shows the following: (Pdb++) query.c._all_columns [Column('metadata_value', VARCHAR(length=255), table=), Column('metadata_value', VARCHAR(length=255), table=), Column('created_at', TIMESTAMP(), table=), ] That way I'm able to keep track of the position of the column in the query, and replace with the rebuilt column accordingly. On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote: > > just as a note, don't access .c._all_columns, just iterate over > selectable.c > > On Thu, Feb 7, 2019 at 10:45 AM Ian Miller > wrote: > > > > Hello Mike, > > > > Thank you for your response! I have currently constructed the ORM > implementation that you suggested in your response. Here's the code: > > > > def _rebuild_non_interval_query_for_group_by(self, session, query): > > from sqlalchemy import table, column, select > > from sqlalchemy.orm import aliased > > from collections import defaultdict, OrderedDict > > > > post_metadata = table( > > "post_metadata", column("post_id"), column("metadata_value") > > ) > > campaign_metadata = table( > > "campaign_metadata", column("campaign_id"), > column("metadata_value") > > ) > > asset_metadata = table( > > "asset_metadata", column("asset_id"), column("metadata_value") > > ) > > > > vw_asset = table("vw_asset", column("id")) > > vw_campaign = table("vw_campaign", column("id")) > > vw_post = table("vw_post", column("id")) > > > > METADATA_PRIMARY_TABLE_MAP = { > > asset_metadata.name: vw_asset, > > campaign_metadata.name: vw_campaign, > > post_metadata.name: vw_post, > > } > > > > METADATA_NAME_TABLE_MAP = { > > asset_metadata.name: asset_metadata, > > campaign_metadata.name: campaign_metadata, > > post_metadata.name: post_metadata, > > } > > > > primary_tables = set() > > metadata_columns_count = defaultdict(int) > > metadata_alias = OrderedDict() > > columns = [] > > for c in query.c._all_columns: > > if c.name == "metadata_value": > > parent_column = list(c.base_columns)[0] > > table = parent_column.table > > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) > > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > += 1 > > alias_number = > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > > alias_name = "{}_{}".format(table.name, alias_number) > > alias = aliased(parent_column.table, alias_name) > > metadata_alias[alias_name] = alias > > column = alias.c.metadata_value.label( > > "{}_{}_{}".format(table.name, alias_number, > "metadata_value") > > ) > > columns.append(column) > > else: > > columns.append(c) > > > > # start constructing query > > non_interval_query = > session.query(*columns).select_from(*primary_tables) > > > > for alias_name, alias in metadata_alias.items(): > > object_type = self._get_object_type_from_metadata_name( > alias.original.name) > > non_interval_query = ( > > non_interval_query > > .join( > > alias, > > getattr(alias.c, "{}_id".format(object_type)) == > METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id > > ) > > ) > > > > non_interval_query = > non_interval_query.subquery("non_interval_query") > > > > return non_interval_query > > > > > > > > The "metadata_alias" values are [('post_metadata_1", alias), > ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 > and post_metadata_2 alias in your example. However, when I reference these > in the join, the
Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query
The reason why I iterated over .c._all_columns was because it shows the references to the 2 "metadata_value" columns, whereas selectable.c only references 1. For example, selectable.c shows the following: (Pdb++) list(query.c) [Column('metadata_value', VARCHAR(length=255), table=), Column('created_at', TIMESTAMP(), table=), ] Enter code here... selectable.c._all_columns shows the following: (Pdb++) query.c._all_columns [Column('metadata_value', VARCHAR(length=255), table=), Column('metadata_value', VARCHAR(length=255), table=), Column('created_at', TIMESTAMP(), table=), ] Enter code here... On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote: > > just as a note, don't access .c._all_columns, just iterate over > selectable.c > > On Thu, Feb 7, 2019 at 10:45 AM Ian Miller > wrote: > > > > Hello Mike, > > > > Thank you for your response! I have currently constructed the ORM > implementation that you suggested in your response. Here's the code: > > > > def _rebuild_non_interval_query_for_group_by(self, session, query): > > from sqlalchemy import table, column, select > > from sqlalchemy.orm import aliased > > from collections import defaultdict, OrderedDict > > > > post_metadata = table( > > "post_metadata", column("post_id"), column("metadata_value") > > ) > > campaign_metadata = table( > > "campaign_metadata", column("campaign_id"), > column("metadata_value") > > ) > > asset_metadata = table( > > "asset_metadata", column("asset_id"), column("metadata_value") > > ) > > > > vw_asset = table("vw_asset", column("id")) > > vw_campaign = table("vw_campaign", column("id")) > > vw_post = table("vw_post", column("id")) > > > > METADATA_PRIMARY_TABLE_MAP = { > > asset_metadata.name: vw_asset, > > campaign_metadata.name: vw_campaign, > > post_metadata.name: vw_post, > > } > > > > METADATA_NAME_TABLE_MAP = { > > asset_metadata.name: asset_metadata, > > campaign_metadata.name: campaign_metadata, > > post_metadata.name: post_metadata, > > } > > > > primary_tables = set() > > metadata_columns_count = defaultdict(int) > > metadata_alias = OrderedDict() > > columns = [] > > for c in query.c._all_columns: > > if c.name == "metadata_value": > > parent_column = list(c.base_columns)[0] > > table = parent_column.table > > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) > > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > += 1 > > alias_number = > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] > > alias_name = "{}_{}".format(table.name, alias_number) > > alias = aliased(parent_column.table, alias_name) > > metadata_alias[alias_name] = alias > > column = alias.c.metadata_value.label( > > "{}_{}_{}".format(table.name, alias_number, > "metadata_value") > > ) > > columns.append(column) > > else: > > columns.append(c) > > > > # start constructing query > > non_interval_query = > session.query(*columns).select_from(*primary_tables) > > > > for alias_name, alias in metadata_alias.items(): > > object_type = self._get_object_type_from_metadata_name( > alias.original.name) > > non_interval_query = ( > > non_interval_query > > .join( > > alias, > > getattr(alias.c, "{}_id".format(object_type)) == > METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id > > ) > > ) > > > > non_interval_query = > non_interval_query.subquery("non_interval_query") > > > > return non_interval_query > > > > > > > > The "metadata_alias" values are [('post_metadata_1", alias), > ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 > and post_metadata_2 alias in your example. However, when I reference these > in the join, the aliased table names are not "post_metadata_1" or > "post_metadata_2"
[sqlalchemy] Re: Unable to reference label in subquery at top level of query
Ah - this occurs because the non-metadata_value columns added in from "query .c._all_columns" are still referencing the original query. I need to figure out how to rebuild those columns as well so that they correctly reference the clauses in the new query. On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote: > > Hello all - > > I am relatively new to using SQLAlchemy for more complex use cases. I am > in the process of creating a time series query, but I am unable to > reference a column by its alias at the top level of the query. > > This is the query that I am trying to address that SQLAlchemy is currently > generating: > > SELECT non_interval_query.metadata_value AS > non_interval_query_metadata_value, >coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1, >timestamp > FROM > (SELECT generate_series(date_trunc('day', date( > '2019-01-06T00:00:00+00:00')), date_trunc('day', date( > '2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series > LEFT OUTER JOIN > (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value > , > post_metadata_2.metadata_value AS post_metadata_2_metadata_value > , > vw_post.created_at AS vw_post_created_at, > coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1 >FROM vw_post >JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = > vw_post.id >JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = > vw_post.id >WHERE post_metadata_1.metadata_value IN ('') > AND post_metadata_2.metadata_value IN ('') > AND vw_post.created_at >= '2019-01-06T00:00:00+00:00' > AND vw_post.created_at <= '2019-01-12T00:00:00+00:00' > AND post_metadata_1.schema_uid = '' > AND post_metadata_1.metadata_name = '' > AND post_metadata_2.schema_uid = '' > AND post_metadata_2.metadata_name = '' > AND vw_post.license_id IN () >GROUP BY vw_post.created_at, > post_metadata_1.metadata_value, > post_metadata_2.metadata_value, > vw_post.created_at) AS non_interval_query ON date_trunc('day', > created_at) = timestamp; > > You'll notice that "non_interval_query.metadata_value AS > non_interval_query_metadata_value" specified at the beginning of the > query is ambiguous due to the 2 "metadata_value" selects in the > "non_interval_query" subquery. What I'm trying to do is have 2 selects at > the top level - one for "non_interval_query.post_metadata_1_metadata_value" > and one for "non_interval_query.post_metadata_2_metadata_value". > > > For reference, here is the code used to generate the above query: > > > def apply_date_group_by(self, session, query, range_gb_params): > field_name = self.db.get("column") > model = self._object.get("model") > > if not field_name or not model: > raise ValueError("Invalid date group by") > > gb_column = self._build_column() > interval = range_gb_params.get("interval") > interval_type = range_gb_params.get("interval_type") > > time_series = func.generate_series( > func.date_trunc(interval_type, func.date(range_gb_params["start"])), > func.date_trunc(interval_type, func.date(range_gb_params["end"])), > interval, > ).label("timestamp") > > ts_column = column("timestamp") > > time_series_query = session.query(time_series).subquery("time_series") > non_interval_query = query.subquery("non_interval_query") > # have to replace the original gb_column with the 'timestamp' column > # in order to properly merge the dataset into the time series dataset > non_gb_columns, gbs = self._prepare_non_gb_columns( > ts_column, gb_column, non_interval_query.columns > ) > > # construct query with correct position passed in from `range_gb_params` > query_position = range_gb_params.get("query_index_position", 0) > non_gb_columns.insert(query_position, ts_column) > > date_gb_query = session.query(*non_gb_columns).select_from( > time_series_query.outerjoin( > non_interval_query, > func.date_trunc(interval_type, column(field_name)) == ts_column, > ) > ) > > if gbs: > date_gb_query = date_gb_query.group_by(*gbs) > > return date_gb_query.order_by(ts_column) > > > > Any help on this would be greatly appreciated! > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To
[sqlalchemy] Re: Unable to reference label in subquery at top level of query
Hello Mike, Thank you for your response! I have currently constructed the ORM implementation that you suggested in your response. Here's the code: def _rebuild_non_interval_query_for_group_by(self, session, query): from sqlalchemy import table, column, select from sqlalchemy.orm import aliased from collections import defaultdict, OrderedDict post_metadata = table( "post_metadata", column("post_id"), column("metadata_value") ) campaign_metadata = table( "campaign_metadata", column("campaign_id"), column("metadata_value") ) asset_metadata = table( "asset_metadata", column("asset_id"), column("metadata_value") ) vw_asset = table("vw_asset", column("id")) vw_campaign = table("vw_campaign", column("id")) vw_post = table("vw_post", column("id")) METADATA_PRIMARY_TABLE_MAP = { asset_metadata.name: vw_asset, campaign_metadata.name: vw_campaign, post_metadata.name: vw_post, } METADATA_NAME_TABLE_MAP = { asset_metadata.name: asset_metadata, campaign_metadata.name: campaign_metadata, post_metadata.name: post_metadata, } primary_tables = set() metadata_columns_count = defaultdict(int) metadata_alias = OrderedDict() columns = [] for c in query.c._all_columns: if c.name == "metadata_value": parent_column = list(c.base_columns)[0] table = parent_column.table primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] += 1 alias_number = metadata_columns_count[METADATA_NAME_TABLE_MAP[ table.name]] alias_name = "{}_{}".format(table.name, alias_number) alias = aliased(parent_column.table, alias_name) metadata_alias[alias_name] = alias column = alias.c.metadata_value.label( "{}_{}_{}".format(table.name, alias_number, "metadata_value" ) ) columns.append(column) else: columns.append(c) # start constructing query non_interval_query = session.query(*columns).select_from(*primary_tables ) for alias_name, alias in metadata_alias.items(): object_type = self._get_object_type_from_metadata_name(alias. original.name) non_interval_query = ( non_interval_query .join( alias, getattr(alias.c, "{}_id".format(object_type)) == METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id ) ) non_interval_query = non_interval_query.subquery("non_interval_query") return non_interval_query The "metadata_alias" values are [('post_metadata_1", alias), ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 and post_metadata_2 alias in your example. However, when I reference these in the join, the aliased table names are not "post_metadata_1" or "post_metadata_2" - they're "post_metadata_3" and "post_metadata_4". I'm unable to figure out why there's a new join seemingly created instead of referencing the aliased tables that were passed in. Here's the query that the above generates: SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value, post_metadata_2.metadata_value AS post_metadata_2_metadata_value, non_interval_query.created_at, non_interval_query.coalesce_1 \nFROM (SELECT post_metadata_3.metadata_value AS metadata_value, post_metadata_4.metadata_value AS metadata_value, vw_post.created_at AS created_at, coalesce(count(DISTINCT vw_post.id), :coalesce_2) AS coalesce_1 \nFROM vw_post JOIN post_metadata AS post_metadata_3 ON post_metadata_3.post_id = vw_post.id JOIN post_metadata AS post_metadata_4 ON post_metadata_4.post_id = vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, : metadata_value_2) AND post_metadata_4.metadata_value IN (:metadata_value_3, : metadata_value_4) AND vw_post.created_at >= :created_at_1 AND vw_post.created_at <= :created_at_2 AND post_metadata_3.schema_uid = :schema_uid_1 AND post_metadata_3.metadata_name = :metadata_name_1 AND post_metadata_4.schema_uid = :schema_uid_2 AND post_metadata_4.metadata_name = :metadata_name_2 AND vw_post.license_id IN (:license_id_1, :license_id_2) GROUP BY vw_post.created_at, post_metadata_3.metadata_value, post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query, vw_post JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = vw_post. id JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = vw_post. id; On
[sqlalchemy] Unable to reference label in subquery at top level of query
Hello all - I am relatively new to using SQLAlchemy for more complex use cases. I am in the process of creating a time series query, but I am unable to reference a column by its alias at the top level of the query. This is the query that I am trying to address that SQLAlchemy is currently generating: SELECT non_interval_query.metadata_value AS non_interval_query_metadata_value, coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1, timestamp FROM (SELECT generate_series(date_trunc('day', date('2019-01-06T00:00:00+00:00' )), date_trunc('day', date('2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series LEFT OUTER JOIN (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value, post_metadata_2.metadata_value AS post_metadata_2_metadata_value, vw_post.created_at AS vw_post_created_at, coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1 FROM vw_post JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = vw_post.id JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = vw_post.id WHERE post_metadata_1.metadata_value IN ('') AND post_metadata_2.metadata_value IN ('') AND vw_post.created_at >= '2019-01-06T00:00:00+00:00' AND vw_post.created_at <= '2019-01-12T00:00:00+00:00' AND post_metadata_1.schema_uid = '' AND post_metadata_1.metadata_name = '' AND post_metadata_2.schema_uid = '' AND post_metadata_2.metadata_name = '' AND vw_post.license_id IN () GROUP BY vw_post.created_at, post_metadata_1.metadata_value, post_metadata_2.metadata_value, vw_post.created_at) AS non_interval_query ON date_trunc('day', created_at) = timestamp; You'll notice that "non_interval_query.metadata_value AS non_interval_query_metadata_value" specified at the beginning of the query is ambiguous due to the 2 "metadata_value" selects in the "non_interval_query" subquery. What I'm trying to do is have 2 selects at the top level - one for "non_interval_query.post_metadata_1_metadata_value" and one for "non_interval_query.post_metadata_2_metadata_value". For reference, here is the code used to generate the above query: def apply_date_group_by(self, session, query, range_gb_params): field_name = self.db.get("column") model = self._object.get("model") if not field_name or not model: raise ValueError("Invalid date group by") gb_column = self._build_column() interval = range_gb_params.get("interval") interval_type = range_gb_params.get("interval_type") time_series = func.generate_series( func.date_trunc(interval_type, func.date(range_gb_params["start"])), func.date_trunc(interval_type, func.date(range_gb_params["end"])), interval, ).label("timestamp") ts_column = column("timestamp") time_series_query = session.query(time_series).subquery("time_series") non_interval_query = query.subquery("non_interval_query") # have to replace the original gb_column with the 'timestamp' column # in order to properly merge the dataset into the time series dataset non_gb_columns, gbs = self._prepare_non_gb_columns( ts_column, gb_column, non_interval_query.columns ) # construct query with correct position passed in from `range_gb_params` query_position = range_gb_params.get("query_index_position", 0) non_gb_columns.insert(query_position, ts_column) date_gb_query = session.query(*non_gb_columns).select_from( time_series_query.outerjoin( non_interval_query, func.date_trunc(interval_type, column(field_name)) == ts_column, ) ) if gbs: date_gb_query = date_gb_query.group_by(*gbs) return date_gb_query.order_by(ts_column) Any help on this would be greatly appreciated! -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.