Re: [sqlite] Would anyone know how to use Sqlite to calculate the streakedness of data? The definition of streakedness is show below. Thank you for your help

2013-02-19 Thread James K. Lowden
On Tue, 19 Feb 2013 05:37:38 -0800
Frank Chang  wrote:

> Would anyone know how to use Sqlite to calculate the streakedness of
> data? The definition of streakedness is show below. Thank you for
> your help.
> 
> [EDIT] From our company's chief software architect, here is the
> requirement for a statistical measure. Could someone please define a
> statistical formula based onour architect's definition of data
> streakedness? -- February 19th 2013 8:45 AM

One way to go about this is to ask your architect the criteria by which
cases A-F were sorted.  IOW, what is the definition of "less than"?  It
seems to be:

0.  non-streak has length zero
1.  longest streak wins
2.  if tied for longest streak, more instances of longest rank wins
3.  if tied for #2, repeat (1,2) with next longest streak

But is that true? It seems like you might be measuring serial
correlation, for which you could conventionally use e.g.
Durbin-Watson.  

To use something like your "streakedness" (not
streakiness?) measure is very difficult without some math
functions that SQLite doesn't have.  Essentially, you want

sum( N * R^S )

where 

R is the longest possible streak
"^" is exponentiation
S is the streak size
N is the number of streaks of length S

Using this table

select name, N, streak from cases;
nameN   streak
--  --  --
A   1   13
B   1   7 
B   1   6 
C   1   7 
C   1   6 
D   1   7 
D   1   3 
D   1   2 
E   1   7 
E   1   3 
F   1   7 

in SQL Server, the query would be:

select name
, sum( N * power(biggest, streak) ) as rank 
from cases 
cross join (
select 1.0*max(streak) as biggest 
from cases
) as A 
group by name 
order by rank desc; 
name rank 
  -
 A302875106592253.0
 B   67575326.0
 C   67575326.0
 D   62750883.0
 E   62750714.0
 F   62748517.0

Without the power() function, you'd need a table of all possible values
of R^S, then

select name, sum(N * p.value) as rank
from cases 
join powers as p
on cases.streak = p.S
where p.R = (
select 1.0*max(streak) as biggest 
from cases
) 
group by name, order by rank desc;

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Would anyone know how to use Sqlite to calculate the streakedness of data? The definition of streakedness is show below. Thank you for your help

2013-02-19 Thread Simon Slavin

On 19 Feb 2013, at 1:37pm, Frank Chang  wrote:

> Would anyone know how to use Sqlite to calculate the streakedness of data?

Yes, technically it might be possible to do this as a collection of SQL 
statements and table updates, but it would be extremely inefficient.

SQLite is a database management system.  To do a calculation, either do it in 
your programming language or implement a SQLite extension which does it.  At 
some point you're actually going to have to learn how to write software.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Would anyone know how to use Sqlite to calculate the streakedness of data? The definition of streakedness is show below. Thank you for your help

2013-02-19 Thread Marc L. Allen
I haven't done SQLite coding in several months, and it's quite rusty, so I'll 
paraphrase.  I haven't tested and if this is bogus, I'm sorry in advance.  But 
maybe it'll give someone the right idea.

You might be better off with a custom function, though.

It would be something like this:

CREATE TABLE D
(
Pos INTEGER PRIMARY KEY AUTOINCREMENT,
Value INTEGER,
Streak INTEGER
)

(insert data into table D, with Streak set to 0.)
Insert invalid data point at the end (e.g. -1)

Update Streak such that it's equal to 1 if it is higher or equal to the value 
at POS - 1.
(This should result in the first POS, streak staying 0 and the final invalid 
point also having streak = 0)

UPDATE D SET Streak = 1
WHERE Pos IN (SELECT d2.POS FROM D d1 INNER JOIN D d2 WHERE d1.POS = d2.POS - 1 
AND d2.value >= d1.value)

Now, D should contain zero's for the beginning positions for each streak.  
Simply examine the distance between consecutive zeros. 

CREATE TABLE Streak
(
First INTEGER,
Last INTEGER,
Length INTEGER
)

INSERT INTO Streak
SELECT d1.pos, d2.pos, d2.pos - d1.pos 
FROM D d1
INNER JOIN D d2 ON d2.pos > d1.pos
WHERE d1.value = 0 and d2.value = 0 and 
not exists (SELECT d3.pos FROM D d3 where d3.value = 0 and 
d3.pos > d1.pos and d3.pos < d2.pos)

That should give you a list of streaks, including streaks of 1.  Aggregate any 
way you want.  

-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Frank Chang
Sent: Tuesday, February 19, 2013 8:38 AM
To: sqlite-users@sqlite.org
Subject: [sqlite] Would anyone know how to use Sqlite to calculate the 
streakedness of data? The definition of streakedness is show below. Thank you 
for your help

Would anyone know how to use Sqlite to calculate the streakedness of data?
The definition of streakedness is show below. Thank you for your help.

[EDIT] From our company's chief software architect, here is the requirement for 
a statistical measure. Could someone please define a statistical formula based 
onour architect's definition of data streakedness? -- February 19th 2013 8:45 AM

Equal numbers are a streak. 1,2,3,3,3,4,5 has a streak of 7.

Case A: 1,2,3,4,5,6,7,8,9,10,11,12,13 has a longest streak of 13.

Case B: 1,2,3,4,5,6,7,3,8,9,10,11,12 has a longest streak of 7, a second 
smaller streak of 6.

Case C: 1,2,3,4,5,6,7,1,2,3,4,5,6 has a longest streak of 7, and a second 
smaller streak of 6.

Case D: 1,2,3,4,5,6,7,1,2,3,1,2,1 has a longest streak of 7, a second smaller 
streak of 3, and a third smallest streak of 2

Case E: 1,2,3,4,5,6,7,6,5,4,1,2,3 has a longest streak of 7, and a second 
smaller streak of 3.

Case F: 1,2,3,4,5,6,7,6,5,4,3,2,1 has a longest streak of 7, and no smaller 
streaks.

The cases A - F are ordered in 'most sorted to least sorted', but all have the 
same length longest streak. Using the averages of streak length is not
appropriate:

A: Average = 13/1 = 13

B: Average = (7+6)/2 = 6.5

C: Average = (7+6)/2 = 6.5

D: Average = (7+3+2)/3 = 4

E: Average = (7+3)/2 = 5

F: Average = 7/1 = 7

Factoring in non-streaks (counting them as 1's):

A: Average = 13/1 = 13

B: Average = (7+6)/3 = 4.3

C: Average = (7+6)/2 = 6.5

D: Average = (7+3+2+1)/4 = 3.25

E: Average = (7+1+1+1+3)/5 = 2.6

F: Average = (7+1+1+1+1+1+1)/7 = 1.85
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users