Re: [sqlite] Is it possible to use SQLITE to calculate the streakedness of numeric data?

2013-02-19 Thread Frank Chang
 joe.fis...@tanguaylab.com, Michael Black, Marc L. Allen, and Simon
Slavin, Our software architect defined data streakedness based upon
Chauvenet's criterion. Thank you for all of your help.

In statistical theory, *Chauvenet's criterion* (named for William
Chauvenet
[1] ) is
a means of assessing whether one piece of experimental data — an
outlier— from a set of
observations, is likely to be spurious.

To apply Chauvenet's criterion, first calculate the
meanand standard
deviation  of the observed
data. Based on how much the suspect datum differs from the mean, use the normal
distribution  function
(or a table thereof) to determine the
probabilitythat a given data
point will be at the value of the suspect data point.
Multiply this probability by the number of data points taken. If the result
is less than 0.5, the suspicious data point may be discarded, i.e., a
reading may be rejected if the probability of obtaining the particular
deviation from the mean is less than 1/(2*n*).


On Tue, Feb 19, 2013 at 11:05 AM, Frank Chang wrote:

>joe.fis...@tanguaylab.com, Michael Black, Marc. L Allen and Simon
> Slavin, Thank you for your help in helping me to convince our company's
> software architect that it is possible to calculate the streakedness of
> numeric data.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is it possible to use SQLITE to calculate the streakedness of numeric data?

2013-02-19 Thread Frank Chang
   joe.fis...@tanguaylab.com, Michael Black, Marc. L Allen and Simon
Slavin, Thank you for your help in helping me to convince our company's
software architect that it is possible to calculate the streakedness of
numeric data.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is it possible to use SQLITE to calculate the streakedness of numeric data?

2013-02-18 Thread Michael Black
Does this get you started?  It calculates a running standard deviation over
a window.
With a bit more effort you can add a running average and other calculations
on the window.
Do you have a reference for this streakedness measurement?  I couldn't find
one.

I used this library extension from http://www.sqlite.org/contrib

http://www.sqlite.org/contrib/download/extension-functions.c?get=25


Assumes window size of 10 but you could put that value in the data (if it's
variable) and use it from there too if you want.
FYI...a linear sequence like this has a constant standard deviation.

select load_extension('./libsqlitefunctions.so');
create table a(n,deviation);
create table window(n);
create trigger trig1 after insert on a begin
delete from window where rowid%10 = new.rowid%10;
insert into window values(new.n);
update a set deviation = (select sqrt(sum(square(n-(select sum(n)/10 from
window)))/10) from window) where new.rowid >=10 and rowid=new.rowid;
end;
insert into a(n) values(1.0);
insert into a(n) values(2.0);
insert into a(n) values(3.0);
insert into a(n) values(4.0);
insert into a(n) values(5.0);
insert into a(n) values(6.0);
insert into a(n) values(7.0);
insert into a(n) values(8.0);
insert into a(n) values(9.0);
insert into a(n) values(10.0);
insert into a(n) values(11.0);
insert into a(n) values(12.0);
insert into a(n) values(13.0);
insert into a(n) values(14.0);
insert into a(n) values(15.0);
insert into a(n) values(16.0);
insert into a(n) values(17.0);
insert into a(n) values(18.0);
insert into a(n) values(19.0);
insert into a(n) values(20.0);
select * from a;
1.0|
2.0|
3.0|
4.0|
5.0|
6.0|
7.0|
8.0|
9.0|
10.0|2.87228132326901
11.0|2.87228132326901
12.0|2.87228132326901
13.0|2.87228132326901
14.0|2.87228132326901
15.0|2.87228132326901
16.0|2.87228132326901
17.0|2.87228132326901
18.0|2.87228132326901
19.0|2.87228132326901
20.0|2.87228132326901



-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Frank Chang
Sent: Monday, February 18, 2013 9:50 AM
To: sqlite-users@sqlite.org
Subject: [sqlite] Is it possible to use SQLITE to calculate the streakedness
of numeric data?

Would anyone know how to use SQLITE to calculate the streakedness of data?
The definition of streakedness is how many deviations away from the
mean(i.e running average a numerical data streak is Thank you for your help.

A variable R can be used to indicate how many deviations away from the mean
a particular streak is. According to the disclosed embodiment, the level of
a streak can be defined not just in (integer*deviation) distances from the
mean but also as (integer*fraction_of_deviation) distances. To accomplish
this, a variable R-factor can be used. The R-factor indicates the
separation between two successive R-levels in terms of a fraction of the
deviation. By varying the R-factor, streaks can be ranked as required.
However, the "credibility" of the streak should also be considered, and
included in a ranking mechanism. The deviation within the streak is an
obvious measure of how staggered the data is within the streak. A good
streak should be less staggered, or in other words, have less deviation.
For this reason, a very high level streak is considered to be good, even if
its deviation is more than what would normally be desired. Thus, while the
level R influences the ranking positively, the deviation within the streak
influences it negatively.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is it possible to use SQLITE to calculate the, streakedness of numeric data?

2013-02-18 Thread joe.fis...@tanguaylab.com

Frank,

Don't really understand your question. Should be no problem to calculate 
anything once. Just write your algorithm. The problem comes in when you 
try to calculate it for a set of data. SQLite like MySQL doesn't have 
SQL 2003/2008 Window Functions. Oracle recently stated they will not be 
adding Window Functions to MySQL. That leaves just Oracle, DB2, and 
PostgreSQL for Window Functions. I often access my SQLite data from the 
GNU R programming language using the RSQLite package. In R you can also 
slice and dice your data but it's not as easy as SQL. R is the best for 
Statistics. Then, you can write back to the database using UPDATE or 
INSERT.


Here's an example from "Demo_R_to_SQLite.R"

# Load Libraries
require(DBI)   # for database
require(RSQLite)# for database

# Connect to Database
drv <- dbDriver("SQLite")
my_sqlite_ex1_RW <- dbConnect(drv, dbname = "my_sqlite_ex1.db", 
flags=SQLITE_RW)
my_sqlite_ex2_RO <- dbConnect(drv, dbname = "my_sqlite_ex2.db", 
flags=SQLITE_RO)
# Do something --> SQL queries (it's simple to make SQL calls) --> check 
the documentation

dbDisconnect(my_sqlite_ex1_RW)
dbDisconnect(my_sqlite_ex2_RO)
dbUnloadDriver(drv)

Joe Fisher
Oregon State University


Would anyone know how to use SQLITE to calculate the streakedness of data?
The definition of streakedness is how many deviations away from the
mean(i.e running average a numerical data streak is Thank you for your help.

A variable R can be used to indicate how many deviations away from the mean
a particular streak is. According to the disclosed embodiment, the level of
a streak can be defined not just in (integer*deviation) distances from the
mean but also as (integer*fraction_of_deviation) distances. To accomplish
this, a variable R-factor can be used. The R-factor indicates the
separation between two successive R-levels in terms of a fraction of the
deviation. By varying the R-factor, streaks can be ranked as required.
However, the "credibility" of the streak should also be considered, and
included in a ranking mechanism. The deviation within the streak is an
obvious measure of how staggered the data is within the streak. A good
streak should be less staggered, or in other words, have less deviation.
For this reason, a very high level streak is considered to be good, even if
its deviation is more than what would normally be desired. Thus, while the
level R influences the ranking positively, the deviation within the streak
influences it negatively.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Is it possible to use SQLITE to calculate the streakedness of numeric data?

2013-02-18 Thread Frank Chang
Would anyone know how to use SQLITE to calculate the streakedness of data?
The definition of streakedness is how many deviations away from the
mean(i.e running average a numerical data streak is Thank you for your help.

A variable R can be used to indicate how many deviations away from the mean
a particular streak is. According to the disclosed embodiment, the level of
a streak can be defined not just in (integer*deviation) distances from the
mean but also as (integer*fraction_of_deviation) distances. To accomplish
this, a variable R-factor can be used. The R-factor indicates the
separation between two successive R-levels in terms of a fraction of the
deviation. By varying the R-factor, streaks can be ranked as required.
However, the "credibility" of the streak should also be considered, and
included in a ranking mechanism. The deviation within the streak is an
obvious measure of how staggered the data is within the streak. A good
streak should be less staggered, or in other words, have less deviation.
For this reason, a very high level streak is considered to be good, even if
its deviation is more than what would normally be desired. Thus, while the
level R influences the ranking positively, the deviation within the streak
influences it negatively.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users