On Jul 22, 2015, at 9:10 PM, Mohit Sindhwani <mo_m...@onghu.com> wrote:
> We have tried this and the query is quite a bit slower.  Filtering to the 
> last 900k records before doing the recorded_on part helped speed it up.

I don't understand how that could possibly be the case if there's an index on 
recorded_on.

> Your email got me going back to look at all the parts again since obviously 
> the query should be using the index and it was still slow.  Further search 
> last night made me realize that it's not the indexes that are a problem.  The 
> problem is the count(distinct group_id) part which seems to be quite slow in 
> PostgreSQL.  This is a lot faster:
> select count(*) from
> (select distinct group_id from
> data_store_v2 where recorded_on >= '2015-06-06') td;
> than:
> select count(distinct group_id) from
> data_store_v2 where recorded_on >= '2015-06-06';
> 
> as explained here: 
> https://www.periscope.io/blog/use-subqueries-to-count-distinct-50x-faster.html
> 
> So, I guess the real problem was being masked by something else and an 
> incorrect assumption on my part :)

I would expect the select count(distinct...) to be a major contributor to the 
time taken by the query, just given the amount of work it must do. The select 
count(*) from (select distinct...) alternative is a nice tip :)

> 
> Thanks for the analysis :D
> I do understand SQL and I thought I'm not throwing things together... for my 
> understanding, which column was unnecessary?  I thought we needed all:
> > group_id for counting the distinct group_id
> > recorded_on for the subsequent query on it
> > id only for getting the most recent records

I apologize--I misread the query structure. I got it into my head as:

select ... from (select ... from ... where recorded_on ... order by ... limit 
...)

I think you can see how THAT query would have better fit my description of 
being poorly constructed.

I'm glad that my somewhat off-base pontification still managed to point you in 
a useful direction!

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice





-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-talk+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/1DD7FF4B-12E3-479B-977F-048C512FF5C1%40elevated-dev.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to