[sqlalchemy] Re: Using operators with multiple InstrumentedAttribute instances

2019-03-29 Thread Ian Miller
The code that makes up the query builder I've been working on is pretty 
extensive, so I'll go through the high-level basics.

I've set up a `_base` method that augments the SQLALchemy Base model. Note 
the column details retrived in `get_column_and_json_key_from_sql_name` class 
method:

class _base:
"""
This class augments the default SQLAlchemy Base model
"""

@classmethod
def get_column_and_json_key_from_sql_name(cls, name):
"""
Returns the column and, if applicable, the JSON top-level key from the 
JSON dict.

:param name: Name of field.

:return:
"""
assert type(name) == str

json_key = None
col_names = name.split(COLUMN_NAME_DELIMITER)
if len(col_names) == 2:
name, json_key = col_names

try:
col = cls.__table__.c[name]
except KeyError:
log.error("Invalid column name: %s", name)
return None

return (getattr(cls, col.name), json_key)


We then have a `_build_column` method that essentially is responsible for 
retrieving the details necessary to construct the column for the select 
statement necessary for constructing the SQLAlchemy ORM query:

def _build_column(self):
field_name = self.db.get("column")
model = self._object.get("model")

column_type, column_key = self.db.get("type"), self.db.get("key")
select_column, json_key = model.get_column_and_json_key_from_sql_name(
field_name
)
select_column = self._construct_json_select_field(
column_type, select_column, json_key, column_key
)

return select_column


What I'm trying to figure out is how to dynamically generate SQLAlchemy ORM 
statements based on the formula. The formulas can be any math equation 
using +, -, /, *, and parentheses:

formula1 = '"metric:1" + "metric:2" + "metric:3"'
formula2 = '"metric:1" + "metric:2" - "metric:3"'
formula3 = '"metric:1" + ("metric:2" * "metric:3")'
formula4 = '"metric:1" / "metric:2"'
formula5 = '"metric:1 / ("metric:2" * "metric:3")'

The InstrumentedAttribute objects I mentioned earlier are the select fields 
for each individual metric - what I need to figure out how to do is to be 
able to build a SQLAlchemy ORM query by parsing the formula string, and 
evaluating each operation in the context of the query. `func.sum` would 
work for formula1, but I'd need to iteratively build the formula with 
SQLAlchemy ORM helper methods for formula2 - formula5.

Per +Jonathan Vanasco's last comment, I've already figured out Phase 1. I'm 
stuck on Phase 2.

On Thursday, March 28, 2019 at 4:43:56 PM UTC-4, Ian Miller wrote:
>
> Hello all,
>
> I am in the process of trying to create a dynamic expression query engine 
> in an application I'm working on.
>
> So there is a formula that gets defined like so: 
>
> formula = '"metric:123" + "metric:456" + "metric:789"'
>
> Each metric maps to a column in the database tables - long story short, 
> I'm able to retrieve the metric by ID, and instantiate an 
> InstrumentedAttribute object that has the SQLAlchemy metadata for the 
> associated column. What I'm trying to achieve is to be able to iterate 
> through the formula, and dynamically build a SQLALchemy query that maps to 
> the formula.
>
> For example, the formula defined above would look something like this in 
> SQL:
>
> SELECT post.id + campaign.id + asset.id
> FROM post, campaign, asset
> WHERE ..;
>
> The idea is to translate the above to something like:
>
> session.query( 0x7ff9269f92b0> +  at 0x7ff9269c5990> +  object at 0x7ff926896048>).all()
>
> I've tried a couple of approaches of dynamically generating the SQLAlchemy 
> ORM query, but I haven't been able to find anything that works. Would 
> anyone have any idea or tips on how to accomplish this?
>
> Thank you!
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Using operators with multiple InstrumentedAttribute instances

2019-03-28 Thread Ian Miller
Hello all,

I am in the process of trying to create a dynamic expression query engine 
in an application I'm working on.

So there is a formula that gets defined like so: 

formula = '"metric:123" + "metric:456" + "metric:789"'

Each metric maps to a column in the database tables - long story short, I'm 
able to retrieve the metric by ID, and instantiate an InstrumentedAttribute 
object that has the SQLAlchemy metadata for the associated column. What I'm 
trying to achieve is to be able to iterate through the formula, and 
dynamically build a SQLALchemy query that maps to the formula.

For example, the formula defined above would look something like this in 
SQL:

SELECT post.id + campaign.id + asset.id
FROM post, campaign, asset
WHERE ..;

The idea is to translate the above to something like:

session.query( +  + ).all()

I've tried a couple of approaches of dynamically generating the SQLAlchemy 
ORM query, but I haven't been able to find anything that works. Would 
anyone have any idea or tips on how to accomplish this?

Thank you!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

2019-02-07 Thread Ian Miller
:facepalm: i should have seen that much sooner. Thank you so much for your 
help. No longer need to do any of that crazy stuff I attempted!

On Thursday, February 7, 2019 at 5:13:41 PM UTC-5, Mike Bayer wrote:
>
> On Thu, Feb 7, 2019 at 2:49 PM Ian Miller  > wrote: 
> > 
> > The reason why I iterated over .c._all_columns was because it shows the 
> references to the 2 "metadata_value" columns, whereas selectable.c only 
> references 1. 
>
>
> so that's a sign that you need to apply labels to these columns 
> because they are conflicting.   with select(), call 
> select().apply_labels() so that these two columns are prefixed with 
> their table name in a label. 
>
>
> > 
> > For example, selectable.c shows the following: 
> > 
> > (Pdb++) list(query.c) 
> > [Column('metadata_value', VARCHAR(length=255), 
> table=), Column('created_at', TIMESTAMP(), 
> table=),  0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... 
> > 
> > selectable.c._all_columns shows the following: 
> > 
> > (Pdb++) query.c._all_columns 
> > [Column('metadata_value', VARCHAR(length=255), 
> table=), Column('metadata_value', VARCHAR(length=255), 
> table=), Column('created_at', TIMESTAMP(), 
> table=),  0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... 
> > 
> > 
> > 
> > 
> > On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote: 
> >> 
> >> just as a note, don't access .c._all_columns, just iterate over 
> selectable.c 
> >> 
> >> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller  wrote: 
> >> > 
> >> > Hello Mike, 
> >> > 
> >> > Thank you for your response! I have currently constructed the ORM 
> implementation that you suggested in your response. Here's the code: 
> >> > 
> >> > def _rebuild_non_interval_query_for_group_by(self, session, query): 
> >> > from sqlalchemy import table, column, select 
> >> > from sqlalchemy.orm import aliased 
> >> > from collections import defaultdict, OrderedDict 
> >> > 
> >> > post_metadata = table( 
> >> > "post_metadata", column("post_id"), column("metadata_value") 
> >> > ) 
> >> > campaign_metadata = table( 
> >> > "campaign_metadata", column("campaign_id"), 
> column("metadata_value") 
> >> > ) 
> >> > asset_metadata = table( 
> >> > "asset_metadata", column("asset_id"), 
> column("metadata_value") 
> >> > ) 
> >> > 
> >> > vw_asset = table("vw_asset", column("id")) 
> >> > vw_campaign = table("vw_campaign", column("id")) 
> >> > vw_post = table("vw_post", column("id")) 
> >> > 
> >> > METADATA_PRIMARY_TABLE_MAP = { 
> >> > asset_metadata.name: vw_asset, 
> >> > campaign_metadata.name: vw_campaign, 
> >> > post_metadata.name: vw_post, 
> >> > } 
> >> > 
> >> > METADATA_NAME_TABLE_MAP = { 
> >> > asset_metadata.name: asset_metadata, 
> >> > campaign_metadata.name: campaign_metadata, 
> >> > post_metadata.name: post_metadata, 
> >> > } 
> >> > 
> >> > primary_tables = set() 
> >> > metadata_columns_count = defaultdict(int) 
> >> > metadata_alias = OrderedDict() 
> >> > columns = [] 
> >> > for c in query.c._all_columns: 
> >> > if c.name == "metadata_value": 
> >> > parent_column = list(c.base_columns)[0] 
> >> > table = parent_column.table 
> >> > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) 
>
> >> > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> += 1 
> >> > alias_number = 
> metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> >> > alias_name = "{}_{}".format(table.name, alias_number) 
> >> > alias = aliased(parent_column.table, alias_name) 
> >> > metadata_alias[alias_name] = alias 
> >> > column = alias.c.metadata_value.label( 
> >> > "{}_{}_{}".format(table.name, alias_number, 
> "metadata_value&

Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

2019-02-07 Thread Ian Miller
The reason why I iterated over .c._all_columns was because it shows the 
references to the 2 "metadata_value" columns, whereas selectable.c only 
references 1.

For example, selectable.c shows the following:

(Pdb++) list(query.c)
[Column('metadata_value', VARCHAR(length=255), table=), 
Column('created_at', TIMESTAMP(), table=), ]

selectable.c._all_columns shows the following:

(Pdb++) query.c._all_columns
[Column('metadata_value', VARCHAR(length=255), table=), 
Column('metadata_value', VARCHAR(length=255), table=), 
Column('created_at', TIMESTAMP(), table=), ]

That way I'm able to keep track of the position of the column in the query, 
and replace with the rebuilt column accordingly.

On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote:
>
> just as a note, don't access .c._all_columns, just iterate over 
> selectable.c 
>
> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller  > wrote: 
> > 
> > Hello Mike, 
> > 
> > Thank you for your response! I have currently constructed the ORM 
> implementation that you suggested in your response. Here's the code: 
> > 
> > def _rebuild_non_interval_query_for_group_by(self, session, query): 
> > from sqlalchemy import table, column, select 
> > from sqlalchemy.orm import aliased 
> > from collections import defaultdict, OrderedDict 
> > 
> > post_metadata = table( 
> > "post_metadata", column("post_id"), column("metadata_value") 
> > ) 
> > campaign_metadata = table( 
> > "campaign_metadata", column("campaign_id"), 
> column("metadata_value") 
> > ) 
> > asset_metadata = table( 
> > "asset_metadata", column("asset_id"), column("metadata_value") 
> > ) 
> > 
> > vw_asset = table("vw_asset", column("id")) 
> > vw_campaign = table("vw_campaign", column("id")) 
> > vw_post = table("vw_post", column("id")) 
> > 
> > METADATA_PRIMARY_TABLE_MAP = { 
> > asset_metadata.name: vw_asset, 
> > campaign_metadata.name: vw_campaign, 
> > post_metadata.name: vw_post, 
> > } 
> > 
> > METADATA_NAME_TABLE_MAP = { 
> > asset_metadata.name: asset_metadata, 
> > campaign_metadata.name: campaign_metadata, 
> > post_metadata.name: post_metadata, 
> > } 
> > 
> > primary_tables = set() 
> > metadata_columns_count = defaultdict(int) 
> > metadata_alias = OrderedDict() 
> > columns = [] 
> > for c in query.c._all_columns: 
> > if c.name == "metadata_value": 
> > parent_column = list(c.base_columns)[0] 
> > table = parent_column.table 
> > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) 
> > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> += 1 
> > alias_number = 
> metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> > alias_name = "{}_{}".format(table.name, alias_number) 
> > alias = aliased(parent_column.table, alias_name) 
> > metadata_alias[alias_name] = alias 
> > column = alias.c.metadata_value.label( 
> > "{}_{}_{}".format(table.name, alias_number, 
> "metadata_value") 
> > ) 
> > columns.append(column) 
> > else: 
> > columns.append(c) 
> > 
> > # start constructing query 
> > non_interval_query = 
> session.query(*columns).select_from(*primary_tables) 
> > 
> > for alias_name, alias in metadata_alias.items(): 
> > object_type = self._get_object_type_from_metadata_name(
> alias.original.name) 
> > non_interval_query = ( 
> > non_interval_query 
> > .join( 
> > alias, 
> > getattr(alias.c, "{}_id".format(object_type)) == 
> METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id 
> > ) 
> > ) 
> > 
> > non_interval_query = 
> non_interval_query.subquery("non_interval_query") 
> > 
> > return non_interval_query 
> > 
> > 
> > 
> > The "metadata_alias" values are [('post_metadata_1", alias), 
> ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 
> and post_metadata_2 alias in your example. However, when I reference these 
> in the join, the 

Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

2019-02-07 Thread Ian Miller
The reason why I iterated over .c._all_columns was because it shows the 
references to the 2 "metadata_value" columns, whereas selectable.c only 
references 1.

For example, selectable.c shows the following:

(Pdb++) list(query.c)
[Column('metadata_value', VARCHAR(length=255), table=), 
Column('created_at', TIMESTAMP(), table=), ]
Enter code here...

selectable.c._all_columns shows the following:

(Pdb++) query.c._all_columns
[Column('metadata_value', VARCHAR(length=255), table=), 
Column('metadata_value', VARCHAR(length=255), table=), 
Column('created_at', TIMESTAMP(), table=), ]
Enter code here...




On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote:
>
> just as a note, don't access .c._all_columns, just iterate over 
> selectable.c 
>
> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller  > wrote: 
> > 
> > Hello Mike, 
> > 
> > Thank you for your response! I have currently constructed the ORM 
> implementation that you suggested in your response. Here's the code: 
> > 
> > def _rebuild_non_interval_query_for_group_by(self, session, query): 
> > from sqlalchemy import table, column, select 
> > from sqlalchemy.orm import aliased 
> > from collections import defaultdict, OrderedDict 
> > 
> > post_metadata = table( 
> > "post_metadata", column("post_id"), column("metadata_value") 
> > ) 
> > campaign_metadata = table( 
> > "campaign_metadata", column("campaign_id"), 
> column("metadata_value") 
> > ) 
> > asset_metadata = table( 
> > "asset_metadata", column("asset_id"), column("metadata_value") 
> > ) 
> > 
> > vw_asset = table("vw_asset", column("id")) 
> > vw_campaign = table("vw_campaign", column("id")) 
> > vw_post = table("vw_post", column("id")) 
> > 
> > METADATA_PRIMARY_TABLE_MAP = { 
> > asset_metadata.name: vw_asset, 
> > campaign_metadata.name: vw_campaign, 
> > post_metadata.name: vw_post, 
> > } 
> > 
> > METADATA_NAME_TABLE_MAP = { 
> > asset_metadata.name: asset_metadata, 
> > campaign_metadata.name: campaign_metadata, 
> > post_metadata.name: post_metadata, 
> > } 
> > 
> > primary_tables = set() 
> > metadata_columns_count = defaultdict(int) 
> > metadata_alias = OrderedDict() 
> > columns = [] 
> > for c in query.c._all_columns: 
> > if c.name == "metadata_value": 
> > parent_column = list(c.base_columns)[0] 
> > table = parent_column.table 
> > primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) 
> > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> += 1 
> > alias_number = 
> metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> > alias_name = "{}_{}".format(table.name, alias_number) 
> > alias = aliased(parent_column.table, alias_name) 
> > metadata_alias[alias_name] = alias 
> > column = alias.c.metadata_value.label( 
> > "{}_{}_{}".format(table.name, alias_number, 
> "metadata_value") 
> > ) 
> > columns.append(column) 
> > else: 
> > columns.append(c) 
> > 
> > # start constructing query 
> > non_interval_query = 
> session.query(*columns).select_from(*primary_tables) 
> > 
> > for alias_name, alias in metadata_alias.items(): 
> > object_type = self._get_object_type_from_metadata_name(
> alias.original.name) 
> > non_interval_query = ( 
> > non_interval_query 
> > .join( 
> > alias, 
> > getattr(alias.c, "{}_id".format(object_type)) == 
> METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id 
> > ) 
> > ) 
> > 
> > non_interval_query = 
> non_interval_query.subquery("non_interval_query") 
> > 
> > return non_interval_query 
> > 
> > 
> > 
> > The "metadata_alias" values are [('post_metadata_1", alias), 
> ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 
> and post_metadata_2 alias in your example. However, when I reference these 
> in the join, the aliased table names are not "post_metadata_1" or 
> "post_metadata_2" 

[sqlalchemy] Re: Unable to reference label in subquery at top level of query

2019-02-07 Thread Ian Miller
Ah - this occurs because the non-metadata_value columns added in from "query
.c._all_columns" are still referencing the original query. I need to figure 
out how to rebuild those columns as well so that they correctly reference 
the clauses in the new query.

On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote:
>
> Hello all - 
>
> I am relatively new to using SQLAlchemy for more complex use cases. I am 
> in the process of creating a time series query, but I am unable to 
> reference a column by its alias at the top level of the query.
>
> This is the query that I am trying to address that SQLAlchemy is currently 
> generating:
>
> SELECT non_interval_query.metadata_value AS 
> non_interval_query_metadata_value,
>coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1,
>timestamp
> FROM
>   (SELECT generate_series(date_trunc('day', date(
> '2019-01-06T00:00:00+00:00')), date_trunc('day', date(
> '2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series
> LEFT OUTER JOIN
>   (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value
> ,
>   post_metadata_2.metadata_value AS post_metadata_2_metadata_value
> ,
>   vw_post.created_at AS vw_post_created_at,
>   coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1
>FROM vw_post
>JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
> vw_post.id
>JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
> vw_post.id
>WHERE post_metadata_1.metadata_value IN ('')
>  AND post_metadata_2.metadata_value IN ('')
>  AND vw_post.created_at >= '2019-01-06T00:00:00+00:00'
>  AND vw_post.created_at <= '2019-01-12T00:00:00+00:00'
>  AND post_metadata_1.schema_uid = ''
>  AND post_metadata_1.metadata_name = ''
>  AND post_metadata_2.schema_uid = ''
>  AND post_metadata_2.metadata_name = ''
>  AND vw_post.license_id IN ()
>GROUP BY vw_post.created_at,
> post_metadata_1.metadata_value,
> post_metadata_2.metadata_value,
> vw_post.created_at) AS non_interval_query ON date_trunc('day', 
> created_at) = timestamp;
>
> You'll notice that "non_interval_query.metadata_value AS 
> non_interval_query_metadata_value" specified at the beginning of the 
> query is ambiguous due to the 2 "metadata_value" selects in the 
> "non_interval_query" subquery. What I'm trying to do is have 2 selects at 
> the top level - one for "non_interval_query.post_metadata_1_metadata_value" 
> and one for "non_interval_query.post_metadata_2_metadata_value".
>
>
> For reference, here is the code used to generate the above query:
>
>
> def apply_date_group_by(self, session, query, range_gb_params):
> field_name = self.db.get("column")
> model = self._object.get("model")
>
> if not field_name or not model:
> raise ValueError("Invalid date group by")
>
> gb_column = self._build_column()
> interval = range_gb_params.get("interval")
> interval_type = range_gb_params.get("interval_type")
>
> time_series = func.generate_series(
> func.date_trunc(interval_type, func.date(range_gb_params["start"])),
> func.date_trunc(interval_type, func.date(range_gb_params["end"])),
> interval,
> ).label("timestamp")
>
> ts_column = column("timestamp")
>
> time_series_query = session.query(time_series).subquery("time_series")
> non_interval_query = query.subquery("non_interval_query")
> # have to replace the original gb_column with the 'timestamp' column
> # in order to properly merge the dataset into the time series dataset
> non_gb_columns, gbs = self._prepare_non_gb_columns(
> ts_column, gb_column, non_interval_query.columns
> )
>
> # construct query with correct position passed in from `range_gb_params`
> query_position = range_gb_params.get("query_index_position", 0)
> non_gb_columns.insert(query_position, ts_column)
>
> date_gb_query = session.query(*non_gb_columns).select_from(
> time_series_query.outerjoin(
> non_interval_query,
> func.date_trunc(interval_type, column(field_name)) == ts_column,
> )
> )
>
> if gbs:
> date_gb_query = date_gb_query.group_by(*gbs)
>
> return date_gb_query.order_by(ts_column)
>
>
>
> Any help on this would be greatly appreciated! 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To

[sqlalchemy] Re: Unable to reference label in subquery at top level of query

2019-02-07 Thread Ian Miller
Hello Mike,

Thank you for your response! I have currently constructed the ORM 
implementation that you suggested in your response. Here's the code:

def _rebuild_non_interval_query_for_group_by(self, session, query):
from sqlalchemy import table, column, select
from sqlalchemy.orm import aliased
from collections import defaultdict, OrderedDict

post_metadata = table(
"post_metadata", column("post_id"), column("metadata_value")
)
campaign_metadata = table(
"campaign_metadata", column("campaign_id"), column("metadata_value")
)
asset_metadata = table(
"asset_metadata", column("asset_id"), column("metadata_value")
)

vw_asset = table("vw_asset", column("id"))
vw_campaign = table("vw_campaign", column("id"))
vw_post = table("vw_post", column("id"))

METADATA_PRIMARY_TABLE_MAP = {
asset_metadata.name: vw_asset,
campaign_metadata.name: vw_campaign,
post_metadata.name: vw_post,
}

METADATA_NAME_TABLE_MAP = {
asset_metadata.name: asset_metadata,
campaign_metadata.name: campaign_metadata,
post_metadata.name: post_metadata,
}

primary_tables = set()
metadata_columns_count = defaultdict(int)
metadata_alias = OrderedDict()
columns = []
for c in query.c._all_columns:
if c.name == "metadata_value":
parent_column = list(c.base_columns)[0]
table = parent_column.table
primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name])
metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] += 1
alias_number = metadata_columns_count[METADATA_NAME_TABLE_MAP[
table.name]]
alias_name = "{}_{}".format(table.name, alias_number)
alias = aliased(parent_column.table, alias_name)
metadata_alias[alias_name] = alias
column = alias.c.metadata_value.label(
"{}_{}_{}".format(table.name, alias_number, "metadata_value"
)
)
columns.append(column)
else:
columns.append(c)

# start constructing query
non_interval_query = session.query(*columns).select_from(*primary_tables
)

for alias_name, alias in metadata_alias.items():
object_type = self._get_object_type_from_metadata_name(alias.
original.name)
non_interval_query = (
non_interval_query
.join(
alias,
getattr(alias.c, "{}_id".format(object_type)) == 
METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id
)
)

non_interval_query = non_interval_query.subquery("non_interval_query")

return non_interval_query



The "metadata_alias" values are [('post_metadata_1", alias), 
('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 
and post_metadata_2 alias in your example. However, when I reference these 
in the join, the aliased table names are not "post_metadata_1" or 
"post_metadata_2" - they're "post_metadata_3" and "post_metadata_4". I'm 
unable to figure out why there's a new join seemingly created instead of 
referencing the aliased tables that were passed in.

Here's the query that the above generates:

SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
   post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
   non_interval_query.created_at,
   non_interval_query.coalesce_1 \nFROM
  (SELECT post_metadata_3.metadata_value AS metadata_value, 
post_metadata_4.metadata_value 
AS metadata_value, vw_post.created_at AS created_at, coalesce(count(DISTINCT 
vw_post.id), :coalesce_2) AS coalesce_1 \nFROM vw_post
   JOIN post_metadata AS post_metadata_3 ON post_metadata_3.post_id = 
vw_post.id
   JOIN post_metadata AS post_metadata_4 ON post_metadata_4.post_id = 
vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, :
metadata_value_2)
   AND post_metadata_4.metadata_value IN (:metadata_value_3, :
metadata_value_4)
   AND vw_post.created_at >= :created_at_1
   AND vw_post.created_at <= :created_at_2
   AND post_metadata_3.schema_uid = :schema_uid_1
   AND post_metadata_3.metadata_name = :metadata_name_1
   AND post_metadata_4.schema_uid = :schema_uid_2
   AND post_metadata_4.metadata_name = :metadata_name_2
   AND vw_post.license_id IN (:license_id_1, :license_id_2)
   GROUP BY vw_post.created_at, post_metadata_3.metadata_value, 
post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query,
  vw_post
JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = vw_post.
id
JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = vw_post.
id;




On

[sqlalchemy] Unable to reference label in subquery at top level of query

2019-02-05 Thread Ian Miller
Hello all - 

I am relatively new to using SQLAlchemy for more complex use cases. I am in 
the process of creating a time series query, but I am unable to reference a 
column by its alias at the top level of the query.

This is the query that I am trying to address that SQLAlchemy is currently 
generating:

SELECT non_interval_query.metadata_value AS 
non_interval_query_metadata_value,
   coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1,
   timestamp
FROM
  (SELECT generate_series(date_trunc('day', date('2019-01-06T00:00:00+00:00'
)), date_trunc('day', date('2019-01-12T00:00:00+00:00')), '1 day') AS 
timestamp) AS time_series
LEFT OUTER JOIN
  (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
  post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
  vw_post.created_at AS vw_post_created_at,
  coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1
   FROM vw_post
   JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
vw_post.id
   JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
vw_post.id
   WHERE post_metadata_1.metadata_value IN ('')
 AND post_metadata_2.metadata_value IN ('')
 AND vw_post.created_at >= '2019-01-06T00:00:00+00:00'
 AND vw_post.created_at <= '2019-01-12T00:00:00+00:00'
 AND post_metadata_1.schema_uid = ''
 AND post_metadata_1.metadata_name = ''
 AND post_metadata_2.schema_uid = ''
 AND post_metadata_2.metadata_name = ''
 AND vw_post.license_id IN ()
   GROUP BY vw_post.created_at,
post_metadata_1.metadata_value,
post_metadata_2.metadata_value,
vw_post.created_at) AS non_interval_query ON date_trunc('day', 
created_at) = timestamp;

You'll notice that "non_interval_query.metadata_value AS 
non_interval_query_metadata_value" specified at the beginning of the query 
is ambiguous due to the 2 "metadata_value" selects in the 
"non_interval_query" subquery. What I'm trying to do is have 2 selects at 
the top level - one for "non_interval_query.post_metadata_1_metadata_value" 
and one for "non_interval_query.post_metadata_2_metadata_value".


For reference, here is the code used to generate the above query:


def apply_date_group_by(self, session, query, range_gb_params):
field_name = self.db.get("column")
model = self._object.get("model")

if not field_name or not model:
raise ValueError("Invalid date group by")

gb_column = self._build_column()
interval = range_gb_params.get("interval")
interval_type = range_gb_params.get("interval_type")

time_series = func.generate_series(
func.date_trunc(interval_type, func.date(range_gb_params["start"])),
func.date_trunc(interval_type, func.date(range_gb_params["end"])),
interval,
).label("timestamp")

ts_column = column("timestamp")

time_series_query = session.query(time_series).subquery("time_series")
non_interval_query = query.subquery("non_interval_query")
# have to replace the original gb_column with the 'timestamp' column
# in order to properly merge the dataset into the time series dataset
non_gb_columns, gbs = self._prepare_non_gb_columns(
ts_column, gb_column, non_interval_query.columns
)

# construct query with correct position passed in from `range_gb_params`
query_position = range_gb_params.get("query_index_position", 0)
non_gb_columns.insert(query_position, ts_column)

date_gb_query = session.query(*non_gb_columns).select_from(
time_series_query.outerjoin(
non_interval_query,
func.date_trunc(interval_type, column(field_name)) == ts_column,
)
)

if gbs:
date_gb_query = date_gb_query.group_by(*gbs)

return date_gb_query.order_by(ts_column)



Any help on this would be greatly appreciated! 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.