Re: [sqlite] Update and GROUP BY

2013-11-07 Thread Gert Van Assche
James,

probably a view is not slower, but if you need this data several times...

I wanted to do an update because we are using this data several times in
the process, and I did not want to add the missing pieces each time.


gert


2013/11/7 James K. Lowden 

> On Mon, 4 Nov 2013 13:01:37 +0100
> Gert Van Assche  wrote:
>
> > Thanks James -- the select query is something I could do, but the
> > update one I could not get that one right.
> >  I was considering to create a new table based on the select query,
> > but since the real data set is millions of records, an update was the
> > best solution.
>
> You're welcome, Gert.  Implicitly I was recommending a view: instead of
> periodically updating a table with derived information, let SQLite
> always provide up-to-date information by deriving it on demand.
>
> With a dataset of millions of records, is performance on such a view
> much different from that of a table?  I don't happen to have such a
> sample handy, and I'm curious what you've seen.
>
> --jkl
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-06 Thread James K. Lowden
On Mon, 4 Nov 2013 13:01:37 +0100
Gert Van Assche  wrote:

> Thanks James -- the select query is something I could do, but the
> update one I could not get that one right.
>  I was considering to create a new table based on the select query,
> but since the real data set is millions of records, an update was the
> best solution.

You're welcome, Gert.  Implicitly I was recommending a view: instead of
periodically updating a table with derived information, let SQLite
always provide up-to-date information by deriving it on demand.  

With a dataset of millions of records, is performance on such a view
much different from that of a table?  I don't happen to have such a
sample handy, and I'm curious what you've seen.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-04 Thread Gert Van Assche
Thanks James -- the select query is something I could do, but the update
one I could not get that one right.
 I was considering to create a new table based on the select query, but
since the real data set is millions of records, an update was the best
solution.

As always, thanks for your help.


gert


2013/11/3 James K. Lowden 

> On Sat, 2 Nov 2013 18:06:30 +0100
> Gert Van Assche  wrote:
>
> > CREATE TABLE T (N, V, G);
> > INSERT INTO T VALUES('a', 1, 'x');
> > INSERT INTO T VALUES('b', 3, 'x');
> > INSERT INTO T VALUES('c', null, 'x');
> > INSERT INTO T VALUES('d', 80, 'y');
> > INSERT INTO T VALUES('e', null, 'y');
> > INSERT INTO T VALUES('f', 60, 'y');
> > INSERT INTO T VALUES('g', null, 'y');
> > INSERT INTO T VALUES('h', null, 'z');
> > INSERT INTO T VALUES('i', 111, 'z');
> >
> > I would like to see where N='c', V as the average for the group (G)
> > were this record belongs to (so 'x').
>
> So, you want the average of the non-missing V per G:
>
> sqlite> select G, avg(V) as avgV from T group by G;
> G   avgV
> --  --
> x   2.0
> y   70.0
> z   111.0
>
> and to see that average for each N that belongs to G:
>
> sqlite> select T.N, A.* from T join (select G, avg(V) as avgV from T
> sqlite> group by G) as A on T.G = A.G;
> N   G   avgV
> --  --  --
> a   x   2.0
> b   x   2.0
> c   x   2.0
> d   y   70.0
> e   y   70.0
> f   y   70.0
> g   y   70.0
> h   z   111.0
> i   z   111.0
>
> --jkl
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-03 Thread James K. Lowden
On Sat, 2 Nov 2013 18:06:30 +0100
Gert Van Assche  wrote:

> CREATE TABLE T (N, V, G);
> INSERT INTO T VALUES('a', 1, 'x');
> INSERT INTO T VALUES('b', 3, 'x');
> INSERT INTO T VALUES('c', null, 'x');
> INSERT INTO T VALUES('d', 80, 'y');
> INSERT INTO T VALUES('e', null, 'y');
> INSERT INTO T VALUES('f', 60, 'y');
> INSERT INTO T VALUES('g', null, 'y');
> INSERT INTO T VALUES('h', null, 'z');
> INSERT INTO T VALUES('i', 111, 'z');
> 
> I would like to see where N='c', V as the average for the group (G)
> were this record belongs to (so 'x').

So, you want the average of the non-missing V per G: 

sqlite> select G, avg(V) as avgV from T group by G;
G   avgV  
--  --
x   2.0   
y   70.0  
z   111.0 

and to see that average for each N that belongs to G: 

sqlite> select T.N, A.* from T join (select G, avg(V) as avgV from T
sqlite> group by G) as A on T.G = A.G;
N   G   avgV  
--  --  --
a   x   2.0   
b   x   2.0   
c   x   2.0   
d   y   70.0  
e   y   70.0  
f   y   70.0  
g   y   70.0  
h   z   111.0 
i   z   111.0 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-02 Thread Jim Callahan
Good question Gert. Good solution, Igor and I like Keith's formatting.

I thought the list might be interested in some of the statistical issues
involved in determining if this method of replacing null values is an
appropriate method for your data analysis and alternatives that are
available.

The statistical term for replacing "missing values" (a type of null value)
with a computed or selected value is  "imputation."

This problem/solution presented on this list is an implementation of a type
of mean imputation. The statistical language R, has an entire package
devoted to imputation (although ironically, it doesn't have this exact
method -- it calculates the mean of an entire column without grouping  or
performs a more complex analysis. Although that may be because R experts
know a way to add the grouping.).
http://cran.r-project.org/web/packages/imputation/imputation.pdf

The Wikipedia article, "Imputation (statistics)"
http://en.wikipedia.org/wiki/Imputation_(statistics)
points out some of the tradeoffs involved:

"Another imputation technique involves replacing any missing value with the
mean of that variable for all other cases [records], which has the benefit
of not changing the sample mean for that variable. However, mean imputation
attenuates any correlations involving the variable(s) that are imputed.
This is because, in cases with imputation, there is guaranteed to be no
relationship between the imputed variable and any other measured variables.
Thus, mean imputation has some attractive properties for univariate
analysis but becomes problematic for multivariate analysis.

Regression imputation has the opposite problem of mean imputation. A
regression model is estimated to predict observed values of a variable
based on other variables, and that model is then used to impute values in
cases where that variable is missing. In other words, available information
for complete and incomplete cases is used to predict whether a value on a
specific variable is missing or not. Fitted values from the regression
model are then used to impute the missing values. The problem is that the
imputed data do not have an error term included in their estimation, thus
the estimates fit perfectly along the regression line without any residual
variance. This causes relationships to be over identified and suggest
greater precision in the imputed values than is warranted. The regression
model predicts the most likely value of missing data but does not supply
uncertainty about that value."

There is a lot more in the Wikipedia article, but this seemed like the most
relevant section.

HTH,
Jim Callahan


On Sat, Nov 2, 2013 at 2:23 PM, Igor Tandetnik  wrote:

> On 11/2/2013 1:06 PM, Gert Van Assche wrote:
>
>> All, I have this table:
>>
>> DROP TABLE T;
>> CREATE TABLE T (N, V, G);
>> INSERT INTO T VALUES('a', 1, 'x');
>> INSERT INTO T VALUES('b', 3, 'x');
>> INSERT INTO T VALUES('c', null, 'x');
>> INSERT INTO T VALUES('d', 80, 'y');
>> INSERT INTO T VALUES('e', null, 'y');
>> INSERT INTO T VALUES('f', 60, 'y');
>> INSERT INTO T VALUES('g', null, 'y');
>> INSERT INTO T VALUES('h', null, 'z');
>> INSERT INTO T VALUES('i', 111, 'z');
>>
>> I would like to see where N='c', V as the average for the group (G) were
>> this record belongs to (so 'x').
>>
>> Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70,
>> and
>> where N=h it would be 111.
>>
>
> I'm not sure I quite follow, but something like this perhaps:
>
> update T set V = (select avg(V) from T t2 where T.G = t2.G)
> where V is null;
>
> --
> Igor Tandetnik
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-02 Thread Gert Van Assche
Keith, thanks for this. The statement Igor made is what I need.

gert


2013/11/2 Keith Medcalf 

>
> Can you write a SELECT which returns the data that you want?
> Can you write a WHERE clause which selects the records you wish to update?
>
> For example:
>
> update T
>set V = (select avg(t2.g)
>   from t as t2
>  where t2.g=t.g)
>  where V IS NULL;
>
> assuming you only want to update the value of V in rows where V IS NULL ...
>
> >-Original Message-
> >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> >boun...@sqlite.org] On Behalf Of Gert Van Assche
> >Sent: Saturday, 2 November, 2013 11:07
> >To: sqlite-users
> >Subject: [sqlite] Update and GROUP BY
> >
> >All, I have this table:
> >
> >DROP TABLE T;
> >CREATE TABLE T (N, V, G);
> >INSERT INTO T VALUES('a', 1, 'x');
> >INSERT INTO T VALUES('b', 3, 'x');
> >INSERT INTO T VALUES('c', null, 'x');
> >INSERT INTO T VALUES('d', 80, 'y');
> >INSERT INTO T VALUES('e', null, 'y');
> >INSERT INTO T VALUES('f', 60, 'y');
> >INSERT INTO T VALUES('g', null, 'y');
> >INSERT INTO T VALUES('h', null, 'z');
> >INSERT INTO T VALUES('i', 111, 'z');
> >
> >I would like to see where N='c', V as the average for the group (G) were
> >this record belongs to (so 'x').
> >
> >Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70,
> >and
> >where N=h it would be 111.
> >
> >I have no clue how to write this UPDATE statement. Could someone help me
> >out?
> >
> >thanks
> >
> >
> >Gert
> >___
> >sqlite-users mailing list
> >sqlite-users@sqlite.org
> >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-02 Thread Gert Van Assche
Igor, that is exactly what I need. I also understand how it is done now.

Thanks!

gert


2013/11/2 Igor Tandetnik 

> On 11/2/2013 1:06 PM, Gert Van Assche wrote:
>
>> All, I have this table:
>>
>> DROP TABLE T;
>> CREATE TABLE T (N, V, G);
>> INSERT INTO T VALUES('a', 1, 'x');
>> INSERT INTO T VALUES('b', 3, 'x');
>> INSERT INTO T VALUES('c', null, 'x');
>> INSERT INTO T VALUES('d', 80, 'y');
>> INSERT INTO T VALUES('e', null, 'y');
>> INSERT INTO T VALUES('f', 60, 'y');
>> INSERT INTO T VALUES('g', null, 'y');
>> INSERT INTO T VALUES('h', null, 'z');
>> INSERT INTO T VALUES('i', 111, 'z');
>>
>> I would like to see where N='c', V as the average for the group (G) were
>> this record belongs to (so 'x').
>>
>> Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70,
>> and
>> where N=h it would be 111.
>>
>
> I'm not sure I quite follow, but something like this perhaps:
>
> update T set V = (select avg(V) from T t2 where T.G = t2.G)
> where V is null;
>
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-02 Thread Keith Medcalf

Can you write a SELECT which returns the data that you want?
Can you write a WHERE clause which selects the records you wish to update?

For example:

update T 
   set V = (select avg(t2.g) 
  from t as t2 
 where t2.g=t.g)
 where V IS NULL;

assuming you only want to update the value of V in rows where V IS NULL ...

>-Original Message-
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Gert Van Assche
>Sent: Saturday, 2 November, 2013 11:07
>To: sqlite-users
>Subject: [sqlite] Update and GROUP BY
>
>All, I have this table:
>
>DROP TABLE T;
>CREATE TABLE T (N, V, G);
>INSERT INTO T VALUES('a', 1, 'x');
>INSERT INTO T VALUES('b', 3, 'x');
>INSERT INTO T VALUES('c', null, 'x');
>INSERT INTO T VALUES('d', 80, 'y');
>INSERT INTO T VALUES('e', null, 'y');
>INSERT INTO T VALUES('f', 60, 'y');
>INSERT INTO T VALUES('g', null, 'y');
>INSERT INTO T VALUES('h', null, 'z');
>INSERT INTO T VALUES('i', 111, 'z');
>
>I would like to see where N='c', V as the average for the group (G) were
>this record belongs to (so 'x').
>
>Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70,
>and
>where N=h it would be 111.
>
>I have no clue how to write this UPDATE statement. Could someone help me
>out?
>
>thanks
>
>
>Gert
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update and GROUP BY

2013-11-02 Thread Igor Tandetnik

On 11/2/2013 1:06 PM, Gert Van Assche wrote:

All, I have this table:

DROP TABLE T;
CREATE TABLE T (N, V, G);
INSERT INTO T VALUES('a', 1, 'x');
INSERT INTO T VALUES('b', 3, 'x');
INSERT INTO T VALUES('c', null, 'x');
INSERT INTO T VALUES('d', 80, 'y');
INSERT INTO T VALUES('e', null, 'y');
INSERT INTO T VALUES('f', 60, 'y');
INSERT INTO T VALUES('g', null, 'y');
INSERT INTO T VALUES('h', null, 'z');
INSERT INTO T VALUES('i', 111, 'z');

I would like to see where N='c', V as the average for the group (G) were
this record belongs to (so 'x').

Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, and
where N=h it would be 111.


I'm not sure I quite follow, but something like this perhaps:

update T set V = (select avg(V) from T t2 where T.G = t2.G)
where V is null;

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Update and GROUP BY

2013-11-02 Thread Gert Van Assche
All, I have this table:

DROP TABLE T;
CREATE TABLE T (N, V, G);
INSERT INTO T VALUES('a', 1, 'x');
INSERT INTO T VALUES('b', 3, 'x');
INSERT INTO T VALUES('c', null, 'x');
INSERT INTO T VALUES('d', 80, 'y');
INSERT INTO T VALUES('e', null, 'y');
INSERT INTO T VALUES('f', 60, 'y');
INSERT INTO T VALUES('g', null, 'y');
INSERT INTO T VALUES('h', null, 'z');
INSERT INTO T VALUES('i', 111, 'z');

I would like to see where N='c', V as the average for the group (G) were
this record belongs to (so 'x').

Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, and
where N=h it would be 111.

I have no clue how to write this UPDATE statement. Could someone help me
out?

thanks


Gert
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users