Re: Order rows before applying GROUP BY / DISTINCT

SGreen Tue, 05 Oct 2004 06:49:45 -0700

You are correct, there is no predicate you can use to pull the refurl 
value from the same row that the MAX(entry) value comes from . That will 
require a second processing stage. Depending on what version of MySQL you 
are running, you have some options:


*store the results of this MAX() query in a temp table and join those 
results back to the userlog table to get the other fields you need (like 
refurl) where the dates and user match.

*use the "anonymous view" feature to skip the step of creating a temp 
table (that's where you surround a query with parentheses and use it like 
any other table in an outer query. Don't forget to alias your "view"). 
This functionality is a side effect of the parsing engine's treatment of 
parentheses in a query and is available as of 4.1 (it arrived with the 
UNION statement) 

*You may be able to use a subquery to get matching non-aggregated values.

*You may be able to use the MAX-CONCAT trick as described in the manual.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"Remi Mikalsen" <[EMAIL PROTECTED]> wrote on 10/04/2004 05:21:44 PM:

> If you aren't tired of this problem, I have one more question...
> 
> The last query worked well, except one small detail... 
> SELECT user, refurl, max( entry ) AS most_recent, count( user ) AS 
visits 
> FROM userlog 
> WHERE user != '' 
> GROUP BY user 
> ORDER BY most_recent DESC 
> LIMIT 10 
> 
> I am now including ONE extra field: "refurl". MySQL doesn't grab the
> refurl from the SAME 
> row as it grabs the last entry time! I would like to be able to get 
> this extra field (and maybe 
> others too), and as they are potentially different from time to 
> time, just any of them won't do 
> it.
> 
> I could probably get around the problem with a join on the time 
> column (maybe I should 
> create an index to enhance efficiency?), but I wonder if there is a 
> more efficient way to solve 
> this? Additionally, IF there were two entries at the same time 
> (which is highly improbable, 
> and not critical at all if it happens once in a million) I might not
> get the correct result anyway. 
> Maybe I'm being picky, but while I'm at it it might as well get well 
done. 
> 
> Thank you for the help so far anyway!! 
> 
> 
> Remi Mikalsen 
> 
> E-Mail:   [EMAIL PROTECTED] 
> URL:   http://www.iMikalsen.com 
> 
> 
> On 4 Oct 2004 at 10:22, [EMAIL PROTECTED] wrote: 
> 
> > 
> > 
> > I would be very surprised if this query fails as you say it does. 
> > 
> > select user, max(time) as most_recent, count(user) as visits 
> > from userlog where user!='' 
> > group by user 
> > order by most_recent desc 
> > limit 10; 
> > 
> > This represents the last 10 users to sign in, when they signed in 
> last, and many times they 
> have 
> > signed in so far. I may not know everything but unless you have 
> some problem with the 
> data so 
> > that your datetime field "time" is not acting the same for all 
> visitors, this should work. 
> Please 
> > describe why this data is wrong for your question. It is entirely 
> likely I misunderstood what 
> you 
> > wanted to find in your data. That would make my suggestions wrong 
> (my apologies if so!!) 
> > 
> > Shawn Green 
> > Database Administrator 
> > Unimin Corporation - Spruce Pine 
> > 
> > "Remi Mikalsen" <[EMAIL PROTECTED]>wrote on 10/04/2004 10:11:09 AM: 
> > 
> > > Thank you for the answer! However, it doesn't seem to solve my 
> > > problem. Running the query 
> > > without modifications repeated users (because of the "group by user, 

> > > HOST"). I removed the 
> > > HOST, and ran the query over again. Now it returned unique users, 
> > > but it still didn't return 
> > > the LAST login of them many logins for each user. 
> > > 
> > > Currently I am using a new implementation with two queries and PHP 
> > > in between. I won't 
> > > write all the code here, just explain the basics. 
> > > 
> > > 1. A query gets the total number of logins for each user 
> > > 2. PHP retrieves the logins and "orders" them with a perfect hash 
> > >    - I create an array with indexes correspondig to usernames 
> > >    - the values in the array are the numbers of logins 
> > > corresponding to each username 
> > >    - this way I have practically no overhead when accessing the 
> > array 
> > >        for example: $users['a_username'] = 10; /* 10 logins 
> > > by user a_username */ 
> > > 3. Another query gets all logins ordered by the time column, last 
> > first 
> > > 4. PHP reads the results. When a user is found that has more than 0 
> > > logins in the $users 
> > > array, I print the user and the amount of logins, and set the value 
> > > in the $users array to 0. 
> > > This guarantees only printing a user once, and only printing the 
> > > last login due to ordering by 
> > > the time field. 
> > > 
> > > But this also has obvious drawbacks as to performance. I may have to 

> > > retrieve 10.000 rows 
> > > from the userlog table to get 10 unique users, or I might never have 

> > > 10 unique users in the 
> > > table, even with 1.000.000.000 records! I just never know. I also 
> > > have to make two queries, 
> > > which gives a small overhead. Finally, using scripting to interprete 

> > > intermediate results is 
> > > much slower than MySQL. All in all, this just doesn't seem to be a 
> > > very scaleable solution. 
> > > 
> > > 
> > > 
> > > Remi Mikalsen 
> > > 
> > > E-Mail:  [EMAIL PROTECTED] 
> > > URL:  http://www.iMikalsen.com 
> > > 
> > > 
> > > 
> > > On 4 Oct 2004 at 9:14, [EMAIL PROTECTED] wrote: 
> > > 
> > > > What you are looking for is the MAX() of the Time column (most 
> > recent 
> > > > entry) 
> > > > 
> > > > select user, host, max(time) as time, count(user) as visits 
> > > > from userlog where user!='' 
> > > > group by user, host 
> > > > order by time desc 
> > > > limit 10; 
> > > > 
> > > > The issue you ran into is caused by an non-standard SQL extension 
> > created 
> > > > by MySQL. Its behavior is documented here: 
> > > > 
> > > > http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html 
> > > > 
> > > > <SOAPBOX ON> 
> > > > IMHO, this extension of the SQL spec has caused more problems than 
 
> > it has 
> > > > helped. I believe a better extension would have been to add a new 
> > GROUP BY 
> > > > predicate like FIRST() or LAST() so that the behavior would be 
> > explicit 
> > > > and controllable. It is even documented that the value returned is 
 
> > > > potentially random 
> > > > 
> > > > "Do not use this feature if the columns you omit from the GROUP BY 
 
> > part 
> > > > are not unique in the group! You will get unpredictable results." 
> > > > 
> > > > I know it's too late to take this behavior out of the older 
> > versions of 
> > > > MySQL but is there any way to prevent its inclusion in future 
> > versions? 
> > > > Are there other options out there to "fix" this non-deterministic 
> > > > behavior? I do not believe that educating the community to _avoid_ 
 
> > a 
> > > > feature is a viable option. If it were, I do not think that we 
> > would need 
> > > > to explain this strange behavior to new users as often as we do. 
> > > > <SOAPBOX OFF> 
> > > > 
> > > > Shawn Green 
> > > > Database Administrator 
> > > > Unimin Corporation - Spruce Pine 
> > > > 
> > > > 
> > > > "Remi Mikalsen" <[EMAIL PROTECTED]> wrote on 10/01/2004 06:41:32 
> > PM: 
> > > > 
> > > > > Hello. 
> > > > > 
> > > > > I'm having a problem where I seem to need to order a table 
> > before 
> > > > > applying group by (or 
> > > > > distinct?) in my query. 
> > > > > 
> > > > > Here is a simplified table structure example: 
> > > > > ID USER HOST TIME 
> > > > > 
> > > > > ID = Primary Key 
> > > > > 
> > > > > I would like to do the following in ONE query, if possible: 
> > > > > I am looking to retrieve the LAST time 10 UNIQUE users were 
> > > > > registered in the table 
> > > > > (user+host+time). These users should be the last 10 people to be 
 
> > > > > inserted into the table 
> > > > > (each user can appear various times in the table, like in a 
> > log). At 
> > > > > the same time, I would like 
> > > > > to retrieve the TOTAL NUMBER of times each of these users appear 
 
> > in 
> > > > > the table, but this is 
> > > > > not very important. 
> > > > > 
> > > > > This was the query I adopted until noticing it had a severe 
> > problem: 
> > > > > 
> > > > > select user, host, time, count(user) as times 
> > > > > from userlog where user!='' 
> > > > > group by user 
> > > > > order by time desc 
> > > > > limit 10; 
> > > > > 
> > > > > The problem is that the TIME associated with each person isn't 
> > the 
> > > > > LAST TIME a registry 
> > > > > was done for the user. This makes me think that I might need to 
> > > > > order the TIME column 
> > > > > before doing the GROUP BY, but I do no know how (and it might 
> > not 
> > > > > even be the solution to 
> > > > > the problem!). 
> > > > > 
> > > > > I do not know if I managed to express myself very well, but if 
> > > > > anyone is willing to help, I 
> > > > > would of course clarify things if necessary. 
> > > > > 
> > > > > 
> > > > > Remi Mikalsen 
> > > > > 
> > > > > E-Mail:  [EMAIL PROTECTED] 
> > > > > URL:  http://www.iMikalsen.com 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

Re: Order rows before applying GROUP BY / DISTINCT

Reply via email to