On May 24, 10:51 pm, Michael Bayer [EMAIL PROTECTED] wrote:
On May 25, 2008, at 1:04 AM, gatto wrote:
been working all day long on this, and i don't feel like i've gotten
any closer than when i started.. read all the related posts on this
group and couldn't determine what to do based on those. hoping
someone can shed some light.
i'm attempting to convert the codebase of an existing project i've
developed to use SA. going ok so far, but i'm stuck on an important
query that contains a subquery and that subquery has a count
function. the original query looks like this:
SELECT id, name, count
FROM (
SELECT t.id, t.name, COUNT(*) AS count
FROM tags t, object_tags ot
WHERE t.id= ot.tag_id
AND ot.object_type_id = 2
GROUP BY t.id, t.name
ORDER BY COUNT(*) DESC, t.name ASC
LIMIT 35
) AS t
ORDER BY name ASC
tho actually, i will be changing the LIMIT clause and the final ORDER
BY clause depending on what the user chooses. this is a query whose
results are used to render a tag cloud.. i've got an intermediate
table called 'object_types' that has a lookup of different types of
entities that could be tagged. e.g. articles or users. so articles
has the object_type_id 2 in the above sql, and that's what i'm
generating the cloud for. object_tags is the relation table that maps
tags and object types to objects..
i had to do the query this way so that i can first get the top [limit]
number of tags ordered by count and name, and then being able to order
that top tag result set further. basically i'm just saying that i
don't see a way to do this without the subselect..
so, how would i go about doing this?? it seems like i need to use
session.query(Tag) with from_statement, func and add_column, but i
just can't figure it out. i'm seriously just about to pull my hair
out trying to understand what to do. any help is greatly appreciated!
if you just want the columns back, you can issue the text directly, or
use a select() construct, which would look like
s = select([tags.c.id, tags.c.name,
func.count('*').label('count')]).where(tags.c.id==object_tags.c.tag_id).
where(object_tags.c.type_id==2).group_by([t.c.id,
t.c.name]).order_by([func.count('*').desc(), t.c.name]).
limit(35)
rows = s.execute().fetchall()
if you want to map id/name to a Tag object, in 0.4 you'd keep a
select() like the above and use it with the Query. it would be along
the lines of
sess
.query(Tag).add_column(s.c.count).select_from(s).order_by(Tag.name) .
Overall the query is not the best approach as you're mixing aggregates
with primary keys (assuming tags.id is the primary key). you really
want the count of maching object_tags rows, I think:
select tag.id, tag.name, obj_count from
tags join
(select tag_id, count('*') AS obj_count from object_tags where
object_type_id=2 group by tag_id order by obj_count limit 35) as
obj_tag_count
on tags.id=obj_tag_count.id order by tag.name
that way you keep unneeded columns out of the GROUP BY. the subquery
looks like:
s = select([object_tags.c.tag_id,
func
.count('*').label('obj_count')]).where(object_tags.c.object_type_id==2).
group_by(object_tags.c.tag_id).order_by(func.count('*')).limit(35)
Note that the order_by() hits the func.count() again. On some
databases this is required (i.e., you can't ORDER BY a label). If you
truly want to force the label in there, say
order_by(literal_column(obj_count)).
then join it into an ORM query using the join:
sess.query(Tag).add_column(s.c.obj_count).select_from(tags.join(s,
Tag.id==s.c.tag_id)).order_by(Tag.name)
The difference between the two approaches is described (in a little
bit of a rant) here:
http://weblogs.sqlteam.com/jeffs/archive/2005/12/14/8546.aspx
you are absolutely correct.. thanks for the education! i will be
using that technique in the future on all group by clauses.
thanks *so much* for the advice. i'm not really close to being done
now i think. i didn't mention before, but i'm actually using Elixir
also. i didn't think it would matter for the querying, but i may be
wrong.. i got the select() part working well, nice... but i'm
getting an error when i try to run the session.query() part. here is
my full code now:
s = select([ObjectTag.c.tag_id, func.count('*').label('obj_count')]).\
where(ObjectTag.c.object_type_id == 2).\
group_by(ObjectTag.c.tag_id).\
order_by(desc(func.count('*'))).\
limit(35)
tags = session.query(Tag).add_column(s.c.obj_count).\
select_from(Tag.table.join(s, Tag.id ==
s.c.tag_id)).order_by(Tag.name)
for t in tags:
t.name
now i'm getting the error: sqlalchemy.exceptions.OperationalError:
(OperationalError) (1248, 'Every derived table must have its own
alias'). is this due to my using elixir? somehow i don't think so..
i maybe just need to label the subquery?