Re: [Fwd: row numbers, jumping rows]

SGreen Tue, 23 Nov 2004 08:45:32 -0800

If I understand your question correctly you want to group results for 
statistical analysis. For instance if you have 2000 results (sequenced in 
some way) you want to be able to split those results into 10 equal (or 
nearly equal) groups of results and re-aggregate your results based on the 
new "generated" group number (like a decile or percentile)


Let's say, as an example,  you want to know how many scores are in each 
decile (tenth) of scores out of a possible maximum of 1600 per test. That 
means you want to count how many people scored from 0-159, 160 - 319, ... 
, 1440 -1600. There are several ways to do this but I choose to make a 
table of my decile ranges (groups) for this example. This also allows you 
to modify the size of each group individually.

CREATE TABLE AnalysisGroups (
        groupID int not null auto_increment
        , grouplow int not null
        , grouphigh int not null
        , primary key(id)
        , key(grouplow, grouphigh)
)

INSERT AnalysisGroups (grouplow, grouphigh) 
VALUES (0,159), (160, 319), (320,479 ), (480,639), (640,799), (800,959), 
(960,1119), (1120,1279), (1280,1439), (1440,1600)

The test scores are stored in another table. If you want to collect how 
many scores fall into each category you just join them together and total 
up the matches

SELECT ag.groupID as decile
        , count(tr.testID) as scores
FROM AnalysisGroups ag
INNER JOIN TestResults tr
        ON tr.score >= ag.grouplow
        AND tr.score <= ag.grouphigh
GROUP BY ag.groupID

If you want to group by population you have another trick you can use 
similar to this one. First we need to create a table that can store the 
results of the query but that also has two additional columns. One of 
those columns is an auto_increment column (so that we number each element 
of the result) and the second will be which "-ile" (percentile, decile, 
quartile) the row falls into. 

Sticking with our current example (test scores) assume we need to compute 
the average score for each percentile of test takers. We could do this by 
first ordering the results from lowest score to highest score (or by any 
other useful measure) then dividing the list into 100 equal groups, 
eventually averaging the score for each group.

CREATE TEMPORARY TABLE tmpAnalysis(
        ID int auto_increment
        , percentile int
        , testID int
        , score float
        key(percentile)
);

INSERT tmpAnalysis(testID, score)
SELECT id, score
FROM TestResults
ORDER BY score;

SELECT @Pctl := count(id)/100 from tmpAnalysis;

UPDATE tmpAnalysis
SET percentile = FLOOR((ID-1)/@Pctl) * @Pctl;

SELECT percentile, AVG(score) as pctl_mean
FROM tmpAnalysis
GROUP BY percentile;

DROP TEMPORARY TABLE tmpAnalysis;

I added an "extra" column to the temp table so that you could see that you 
could use that table for multiple purposes. Once I have assigned the 
percentile numbers to each row, I could have identified which tests (and 
which test takers) fell into each percentile.

SELECT tr.taker, tr.score
FROM TestResults tr
INNER JOIN tmpAnalysis a
        on a.testID = tr.id
Where a.percentile >= 95

Assuming you haven't dropped the temp table yet, that query will give you 
the list of who scored in the top 5% on that particular test.

>>Disclaimer<< I haven't had time to test any of this with live data. If 
these examples don't work the first time, it's probably because I made a 
typing error. Apologies in advance.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


João Borsoi Soares <[EMAIL PROTECTED]> wrote on 11/23/2004 06:30:26 
AM:

> No body has answered my question so far. Does that mean there is no way
> to retrieve current row numbers in a query or no way to retrieve results
> jumping steps?
> 
> I would appreciate any help.
> 
> Thanks,
> Joao.
> 
> -----Mensagem encaminhada-----
> > From: João Borsoi Soares <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> > Subject: row numbers, jumping rows
> > Date: Sun, 21 Nov 2004 11:23:41 -0200
> > 
> > Hello list,
> > 
> > I wonder if someone could help me to get the fastest result on my 
query.
> > I need to classify one entity, according to a specific attribute 
(let's
> > call it X), looking at a sample extracted from a database table. 
> > First I need to sort my sample ordered by a specific table field (X).
> > Then I should divide my sorted sample in 10 equal groups (same number 
of
> > rows). Finally, I should classify my entity (1 to 10), finding out at
> > which group its attribute X fits in.
> > 
> > Let me explain my thoughts. First I should make a "SELECT COUNT(*).." 
to
> > find the total number of rows in my sample. Then I make queries using
> > LIMIT until I find which group the X attribute fits in. In the worst
> > case I will have to do 10 queries. Which I think should take too long 
in
> > my case.
> > 
> > I wanted to make it in only one query. It could be possible if I could
> > make a query which returns result jumping rows, like "STEP N" where 
"N"
> > could be the number of items in each group. Is there anything in mysql
> > that can give me that? I also thought if I could have the row numbers 
in
> > the query result, I could solve it with a simple condition like, 
"WHERE
> > rowNum % N = 0".
> > 
> > Any ideas?
> > 
> > Thanks.
> > 
> > 
> > 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

Re: [Fwd: row numbers, jumping rows]

Reply via email to