Re: [sqlite] Update and GROUP BY
James, probably a view is not slower, but if you need this data several times... I wanted to do an update because we are using this data several times in the process, and I did not want to add the missing pieces each time. gert 2013/11/7 James K. Lowden > On Mon, 4 Nov 2013 13:01:37 +0100 > Gert Van Assche wrote: > > > Thanks James -- the select query is something I could do, but the > > update one I could not get that one right. > > I was considering to create a new table based on the select query, > > but since the real data set is millions of records, an update was the > > best solution. > > You're welcome, Gert. Implicitly I was recommending a view: instead of > periodically updating a table with derived information, let SQLite > always provide up-to-date information by deriving it on demand. > > With a dataset of millions of records, is performance on such a view > much different from that of a table? I don't happen to have such a > sample handy, and I'm curious what you've seen. > > --jkl > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
On Mon, 4 Nov 2013 13:01:37 +0100 Gert Van Assche wrote: > Thanks James -- the select query is something I could do, but the > update one I could not get that one right. > I was considering to create a new table based on the select query, > but since the real data set is millions of records, an update was the > best solution. You're welcome, Gert. Implicitly I was recommending a view: instead of periodically updating a table with derived information, let SQLite always provide up-to-date information by deriving it on demand. With a dataset of millions of records, is performance on such a view much different from that of a table? I don't happen to have such a sample handy, and I'm curious what you've seen. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
Thanks James -- the select query is something I could do, but the update one I could not get that one right. I was considering to create a new table based on the select query, but since the real data set is millions of records, an update was the best solution. As always, thanks for your help. gert 2013/11/3 James K. Lowden > On Sat, 2 Nov 2013 18:06:30 +0100 > Gert Van Assche wrote: > > > CREATE TABLE T (N, V, G); > > INSERT INTO T VALUES('a', 1, 'x'); > > INSERT INTO T VALUES('b', 3, 'x'); > > INSERT INTO T VALUES('c', null, 'x'); > > INSERT INTO T VALUES('d', 80, 'y'); > > INSERT INTO T VALUES('e', null, 'y'); > > INSERT INTO T VALUES('f', 60, 'y'); > > INSERT INTO T VALUES('g', null, 'y'); > > INSERT INTO T VALUES('h', null, 'z'); > > INSERT INTO T VALUES('i', 111, 'z'); > > > > I would like to see where N='c', V as the average for the group (G) > > were this record belongs to (so 'x'). > > So, you want the average of the non-missing V per G: > > sqlite> select G, avg(V) as avgV from T group by G; > G avgV > -- -- > x 2.0 > y 70.0 > z 111.0 > > and to see that average for each N that belongs to G: > > sqlite> select T.N, A.* from T join (select G, avg(V) as avgV from T > sqlite> group by G) as A on T.G = A.G; > N G avgV > -- -- -- > a x 2.0 > b x 2.0 > c x 2.0 > d y 70.0 > e y 70.0 > f y 70.0 > g y 70.0 > h z 111.0 > i z 111.0 > > --jkl > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
On Sat, 2 Nov 2013 18:06:30 +0100 Gert Van Assche wrote: > CREATE TABLE T (N, V, G); > INSERT INTO T VALUES('a', 1, 'x'); > INSERT INTO T VALUES('b', 3, 'x'); > INSERT INTO T VALUES('c', null, 'x'); > INSERT INTO T VALUES('d', 80, 'y'); > INSERT INTO T VALUES('e', null, 'y'); > INSERT INTO T VALUES('f', 60, 'y'); > INSERT INTO T VALUES('g', null, 'y'); > INSERT INTO T VALUES('h', null, 'z'); > INSERT INTO T VALUES('i', 111, 'z'); > > I would like to see where N='c', V as the average for the group (G) > were this record belongs to (so 'x'). So, you want the average of the non-missing V per G: sqlite> select G, avg(V) as avgV from T group by G; G avgV -- -- x 2.0 y 70.0 z 111.0 and to see that average for each N that belongs to G: sqlite> select T.N, A.* from T join (select G, avg(V) as avgV from T sqlite> group by G) as A on T.G = A.G; N G avgV -- -- -- a x 2.0 b x 2.0 c x 2.0 d y 70.0 e y 70.0 f y 70.0 g y 70.0 h z 111.0 i z 111.0 --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
Good question Gert. Good solution, Igor and I like Keith's formatting. I thought the list might be interested in some of the statistical issues involved in determining if this method of replacing null values is an appropriate method for your data analysis and alternatives that are available. The statistical term for replacing "missing values" (a type of null value) with a computed or selected value is "imputation." This problem/solution presented on this list is an implementation of a type of mean imputation. The statistical language R, has an entire package devoted to imputation (although ironically, it doesn't have this exact method -- it calculates the mean of an entire column without grouping or performs a more complex analysis. Although that may be because R experts know a way to add the grouping.). http://cran.r-project.org/web/packages/imputation/imputation.pdf The Wikipedia article, "Imputation (statistics)" http://en.wikipedia.org/wiki/Imputation_(statistics) points out some of the tradeoffs involved: "Another imputation technique involves replacing any missing value with the mean of that variable for all other cases [records], which has the benefit of not changing the sample mean for that variable. However, mean imputation attenuates any correlations involving the variable(s) that are imputed. This is because, in cases with imputation, there is guaranteed to be no relationship between the imputed variable and any other measured variables. Thus, mean imputation has some attractive properties for univariate analysis but becomes problematic for multivariate analysis. Regression imputation has the opposite problem of mean imputation. A regression model is estimated to predict observed values of a variable based on other variables, and that model is then used to impute values in cases where that variable is missing. In other words, available information for complete and incomplete cases is used to predict whether a value on a specific variable is missing or not. Fitted values from the regression model are then used to impute the missing values. The problem is that the imputed data do not have an error term included in their estimation, thus the estimates fit perfectly along the regression line without any residual variance. This causes relationships to be over identified and suggest greater precision in the imputed values than is warranted. The regression model predicts the most likely value of missing data but does not supply uncertainty about that value." There is a lot more in the Wikipedia article, but this seemed like the most relevant section. HTH, Jim Callahan On Sat, Nov 2, 2013 at 2:23 PM, Igor Tandetnik wrote: > On 11/2/2013 1:06 PM, Gert Van Assche wrote: > >> All, I have this table: >> >> DROP TABLE T; >> CREATE TABLE T (N, V, G); >> INSERT INTO T VALUES('a', 1, 'x'); >> INSERT INTO T VALUES('b', 3, 'x'); >> INSERT INTO T VALUES('c', null, 'x'); >> INSERT INTO T VALUES('d', 80, 'y'); >> INSERT INTO T VALUES('e', null, 'y'); >> INSERT INTO T VALUES('f', 60, 'y'); >> INSERT INTO T VALUES('g', null, 'y'); >> INSERT INTO T VALUES('h', null, 'z'); >> INSERT INTO T VALUES('i', 111, 'z'); >> >> I would like to see where N='c', V as the average for the group (G) were >> this record belongs to (so 'x'). >> >> Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, >> and >> where N=h it would be 111. >> > > I'm not sure I quite follow, but something like this perhaps: > > update T set V = (select avg(V) from T t2 where T.G = t2.G) > where V is null; > > -- > Igor Tandetnik > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
Keith, thanks for this. The statement Igor made is what I need. gert 2013/11/2 Keith Medcalf > > Can you write a SELECT which returns the data that you want? > Can you write a WHERE clause which selects the records you wish to update? > > For example: > > update T >set V = (select avg(t2.g) > from t as t2 > where t2.g=t.g) > where V IS NULL; > > assuming you only want to update the value of V in rows where V IS NULL ... > > >-Original Message- > >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > >boun...@sqlite.org] On Behalf Of Gert Van Assche > >Sent: Saturday, 2 November, 2013 11:07 > >To: sqlite-users > >Subject: [sqlite] Update and GROUP BY > > > >All, I have this table: > > > >DROP TABLE T; > >CREATE TABLE T (N, V, G); > >INSERT INTO T VALUES('a', 1, 'x'); > >INSERT INTO T VALUES('b', 3, 'x'); > >INSERT INTO T VALUES('c', null, 'x'); > >INSERT INTO T VALUES('d', 80, 'y'); > >INSERT INTO T VALUES('e', null, 'y'); > >INSERT INTO T VALUES('f', 60, 'y'); > >INSERT INTO T VALUES('g', null, 'y'); > >INSERT INTO T VALUES('h', null, 'z'); > >INSERT INTO T VALUES('i', 111, 'z'); > > > >I would like to see where N='c', V as the average for the group (G) were > >this record belongs to (so 'x'). > > > >Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, > >and > >where N=h it would be 111. > > > >I have no clue how to write this UPDATE statement. Could someone help me > >out? > > > >thanks > > > > > >Gert > >___ > >sqlite-users mailing list > >sqlite-users@sqlite.org > >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
Igor, that is exactly what I need. I also understand how it is done now. Thanks! gert 2013/11/2 Igor Tandetnik > On 11/2/2013 1:06 PM, Gert Van Assche wrote: > >> All, I have this table: >> >> DROP TABLE T; >> CREATE TABLE T (N, V, G); >> INSERT INTO T VALUES('a', 1, 'x'); >> INSERT INTO T VALUES('b', 3, 'x'); >> INSERT INTO T VALUES('c', null, 'x'); >> INSERT INTO T VALUES('d', 80, 'y'); >> INSERT INTO T VALUES('e', null, 'y'); >> INSERT INTO T VALUES('f', 60, 'y'); >> INSERT INTO T VALUES('g', null, 'y'); >> INSERT INTO T VALUES('h', null, 'z'); >> INSERT INTO T VALUES('i', 111, 'z'); >> >> I would like to see where N='c', V as the average for the group (G) were >> this record belongs to (so 'x'). >> >> Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, >> and >> where N=h it would be 111. >> > > I'm not sure I quite follow, but something like this perhaps: > > update T set V = (select avg(V) from T t2 where T.G = t2.G) > where V is null; > > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
Can you write a SELECT which returns the data that you want? Can you write a WHERE clause which selects the records you wish to update? For example: update T set V = (select avg(t2.g) from t as t2 where t2.g=t.g) where V IS NULL; assuming you only want to update the value of V in rows where V IS NULL ... >-Original Message- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Gert Van Assche >Sent: Saturday, 2 November, 2013 11:07 >To: sqlite-users >Subject: [sqlite] Update and GROUP BY > >All, I have this table: > >DROP TABLE T; >CREATE TABLE T (N, V, G); >INSERT INTO T VALUES('a', 1, 'x'); >INSERT INTO T VALUES('b', 3, 'x'); >INSERT INTO T VALUES('c', null, 'x'); >INSERT INTO T VALUES('d', 80, 'y'); >INSERT INTO T VALUES('e', null, 'y'); >INSERT INTO T VALUES('f', 60, 'y'); >INSERT INTO T VALUES('g', null, 'y'); >INSERT INTO T VALUES('h', null, 'z'); >INSERT INTO T VALUES('i', 111, 'z'); > >I would like to see where N='c', V as the average for the group (G) were >this record belongs to (so 'x'). > >Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, >and >where N=h it would be 111. > >I have no clue how to write this UPDATE statement. Could someone help me >out? > >thanks > > >Gert >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Update and GROUP BY
On 11/2/2013 1:06 PM, Gert Van Assche wrote: All, I have this table: DROP TABLE T; CREATE TABLE T (N, V, G); INSERT INTO T VALUES('a', 1, 'x'); INSERT INTO T VALUES('b', 3, 'x'); INSERT INTO T VALUES('c', null, 'x'); INSERT INTO T VALUES('d', 80, 'y'); INSERT INTO T VALUES('e', null, 'y'); INSERT INTO T VALUES('f', 60, 'y'); INSERT INTO T VALUES('g', null, 'y'); INSERT INTO T VALUES('h', null, 'z'); INSERT INTO T VALUES('i', 111, 'z'); I would like to see where N='c', V as the average for the group (G) were this record belongs to (so 'x'). Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, and where N=h it would be 111. I'm not sure I quite follow, but something like this perhaps: update T set V = (select avg(V) from T t2 where T.G = t2.G) where V is null; -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Update and GROUP BY
All, I have this table: DROP TABLE T; CREATE TABLE T (N, V, G); INSERT INTO T VALUES('a', 1, 'x'); INSERT INTO T VALUES('b', 3, 'x'); INSERT INTO T VALUES('c', null, 'x'); INSERT INTO T VALUES('d', 80, 'y'); INSERT INTO T VALUES('e', null, 'y'); INSERT INTO T VALUES('f', 60, 'y'); INSERT INTO T VALUES('g', null, 'y'); INSERT INTO T VALUES('h', null, 'z'); INSERT INTO T VALUES('i', 111, 'z'); I would like to see where N='c', V as the average for the group (G) were this record belongs to (so 'x'). Thus where N='c' I would get 2, and where N='e' or 'g', it would be 70, and where N=h it would be 111. I have no clue how to write this UPDATE statement. Could someone help me out? thanks Gert ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users