Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On Tue, Nov 8, 2011 at 7:11 PM, Richard Hippwrote: > On Tue, Nov 8, 2011 at 7:08 PM, Gabor Grothendieck > wrote: > >> On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp wrote: >> > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck >> > wrote: >> > >> >> In R, the RSQLite driver for SQLite currently has >> >> SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people >> >> for many different projects and on different platforms and it seems >> >> that a number of these projects want a larger number. Users don't >> >> compile this themselves so they are stuck with whatever number is >> >> compiled in for them. >> >> >> >> What are the considerations for setting this number? >> >> >> >> Is there any general advice that can be given on how to set this number? >> >> >> > >> > I just checked and it appears that MacOS Lion compiles it a 50. >> > >> > Whenever you use a value like ?N, SQLite allocates an array of N objects, >> > each of 72 bytes in size. So doing "SELECT ?50" on Lion requires a >> > 36MB memory allocation (with its accompanying memset()). That's really >> the >> > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER. >> >> If "SELECT ?50" allocates 50 * 72 bytes of memory then how >> does that relate to SQLITE_MAX_VARIABLE_NUMBER? >> SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at >> all. >> > > SQLITE_MAX_VARIABLE_NUMBER determines the largest N for which ?N will > work. So by default, the maximum allocation is 999*72. You can increase > this to whatever you are comfortable with. > Thanks. So there is really no downside to making it 10,000, say? Those who want it that large will be able to have that many columns and those who don't need that many won't incur any penalties. Is that right? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On 9 Nov 2011, at 12:08am, Gabor Grothendieck wrote: > If "SELECT ?50" allocates 50 * 72 bytes of memory then how > does that relate to SQLITE_MAX_VARIABLE_NUMBER? > SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at > all. In the above case, SQLite was compiled with SQLITE_MAX_VARIABLE_NUMBER = 50. It's a limit, rather than operational use. Setting a lower value for SQLITE_MAX_VARIABLE_NUMBER means your user can't suddenly cause SQLite to demand 36 Meg of memory. This can be extremely important for embedded systems where such a demand could crash the entire device. It would be better to crash just the process which is trying to use SQLite. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On Tue, Nov 8, 2011 at 7:08 PM, Gabor Grothendieckwrote: > On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp wrote: > > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck > > wrote: > > > >> In R, the RSQLite driver for SQLite currently has > >> SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people > >> for many different projects and on different platforms and it seems > >> that a number of these projects want a larger number. Users don't > >> compile this themselves so they are stuck with whatever number is > >> compiled in for them. > >> > >> What are the considerations for setting this number? > >> > >> Is there any general advice that can be given on how to set this number? > >> > > > > I just checked and it appears that MacOS Lion compiles it a 50. > > > > Whenever you use a value like ?N, SQLite allocates an array of N objects, > > each of 72 bytes in size. So doing "SELECT ?50" on Lion requires a > > 36MB memory allocation (with its accompanying memset()). That's really > the > > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER. > > If "SELECT ?50" allocates 50 * 72 bytes of memory then how > does that relate to SQLITE_MAX_VARIABLE_NUMBER? > SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at > all. > SQLITE_MAX_VARIABLE_NUMBER determines the largest N for which ?N will work. So by default, the maximum allocation is 999*72. You can increase this to whatever you are comfortable with. > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On Tue, Nov 8, 2011 at 6:46 PM, Richard Hippwrote: > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck > wrote: > >> In R, the RSQLite driver for SQLite currently has >> SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people >> for many different projects and on different platforms and it seems >> that a number of these projects want a larger number. Users don't >> compile this themselves so they are stuck with whatever number is >> compiled in for them. >> >> What are the considerations for setting this number? >> >> Is there any general advice that can be given on how to set this number? >> > > I just checked and it appears that MacOS Lion compiles it a 50. > > Whenever you use a value like ?N, SQLite allocates an array of N objects, > each of 72 bytes in size. So doing "SELECT ?50" on Lion requires a > 36MB memory allocation (with its accompanying memset()). That's really the > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER. If "SELECT ?50" allocates 50 * 72 bytes of memory then how does that relate to SQLITE_MAX_VARIABLE_NUMBER? SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at all. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieckwrote: > In R, the RSQLite driver for SQLite currently has > SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people > for many different projects and on different platforms and it seems > that a number of these projects want a larger number. Users don't > compile this themselves so they are stuck with whatever number is > compiled in for them. > > What are the considerations for setting this number? > > Is there any general advice that can be given on how to set this number? > I just checked and it appears that MacOS Lion compiles it a 50. Whenever you use a value like ?N, SQLite allocates an array of N objects, each of 72 bytes in size. So doing "SELECT ?50" on Lion requires a 36MB memory allocation (with its accompanying memset()). That's really the only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER. > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On Tue, Nov 8, 2011 at 5:55 PM, Simon Slavinwrote: > > On 8 Nov 2011, at 10:50pm, Gabor Grothendieck wrote: > >> In R, the RSQLite driver for SQLite currently has >> SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people >> for many different projects and on different platforms and it seems >> that a number of these projects want a larger number. > > What sort of statements are R users doing which might require binding a > thousand variables to one statement ? I can't think of any situation like > this that doesn't indicate an insane schema which should be normalised. > Anyone who uses R could be using this so we don't really know. We do know that there seems to be multiple requests for increasing the limit. R is used for analyzing data and when when one is doing that one does not always generate the data oneself but receives it from an external source. This may include files which may be too large to read into R or might fit in but are too slow to read into R. They might be read into a database and then a portion read into R from the database. One could imagine the rows might represent individuals and the columns might represent a large number of genes. Or perhaps each row is an individual and each column is a health marker. Or each row is a time point and each column is a security. Typically such users must use a different database but would have preferred to use SQLite hence the question of what are the considerations of coming up with a single SQLITE_MAX_VARIABLE_NUMBER that everyone is stuck with. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER
On 8 Nov 2011, at 10:50pm, Gabor Grothendieck wrote: > In R, the RSQLite driver for SQLite currently has > SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people > for many different projects and on different platforms and it seems > that a number of these projects want a larger number. What sort of statements are R users doing which might require binding a thousand variables to one statement ? I can't think of any situation like this that doesn't indicate an insane schema which should be normalised. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLITE_MAX_VARIABLE_NUMBER
In R, the RSQLite driver for SQLite currently has SQLITE_MAX_VARIABLE_NUMBER set to 999. This is used by many people for many different projects and on different platforms and it seems that a number of these projects want a larger number. Users don't compile this themselves so they are stuck with whatever number is compiled in for them. What are the considerations for setting this number? Is there any general advice that can be given on how to set this number? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very widefile
Regarding: >> I am sure there is a better way to deal with 12K rows by 2500 >> columns, but I can't figure it out I wonder if you might want to use *sed* or *awk* or *perl* to preprocess the data before import. A "master" table could contain the unique person id, plus the fields that you intend to index and that you are likely to filter upon most often. Other tables could exist for the remaining data, and could be joined on the person id as needed. This might: -- let you avoid a customized version of sqlite -- allow your most-used queries to run faster ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file
See below. On Sun, Dec 28, 2008 at 11:56 PM, Chris Wedgwoodwrote: > On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote: > >> I am sure there is a better way to deal with 12K rows by 2500 columns, >> but I can't figure it out > > 2500 columns sounds like a nightmare to deal with > > could you perhaps explain that data layout a little? > It is a download of a huge longitudinal survey (www.bls.gov/nls/nlsy79.htm) that has been converted out of the proprietary format into SAS, and now I want to convert it into a single SQLITE database per wave. I will wind up connecting people by ID across the waves to show patterns of moving etc... For each wave/ table, each row describes contains integers that code for information about a single respondent, such as age, whether employed in June (either zero or one), whether employed in July, etc... Since the NLSY doesn't do multiple tables, this is very much NOT normalized. What the codes mean is described in a separate codebook (-5 = missing data, 1=living at home, etc). There is a separate table for each wave (1979, 1980, ... 2006). I have managed (just now) to get it working with a hacked version of SQLITE. Here is a meaningless query, just to confirm: sqlite> select W0072400, count(*) as c from data_stuff group by W0072400 order by c desc limit 5; 0,9204 -5,2513 100,293 1,80 3,43 CPU Time: user 0.917062 sys 0.364962 Like I say, I may be going about it all wrong, but I can't run the proprietary software on my Mac, and SQL makes me comfortable. I hope to pull out the data I want via SQL (a processed 1% of the total), then run statistical analyses and graphics with R. I am describing all this in hopes there is another quantitative sociologist out there using SQLITE! TIA ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file
>> I am sure there is a better way to deal with 12K rows by 2500 columns, >> but I can't figure it out > > 2500 columns sounds like a nightmare to deal with > > could you perhaps explain that data layout a little? It is a download of a huge longitudinal survey (www.bls.gov/nls/nlsy79.htm) that has been converted out of the proprietary format into SAS, and now I want to convert it into a single SQLITE database per wave. I will wind up connecting people by ID across the waves to show patterns of moving etc... For each wave/ table, each row describes contains integers that code for information about a single respondent, such as age, whether employed in June (either zero or one), whether employed in July, etc... Since the NLSY doesn't do multiple tables, this is very much NOT normalized. What the codes mean is described in a separate codebook (-5 = missing data, 1=living at home, etc). There is a separate table for each wave (1979, 1980, ... 2006). I have managed (just now) to get it working with a hacked version of SQLITE. Here is a meaningless query, just to confirm: sqlite> select W0072400, count(*) as c from data_stuff group by W0072400 order by c desc limit 5; 0,9204 -5,2513 100,293 1,80 3,43 CPU Time: user 0.917062 sys 0.364962 Like I say, I may be going about it all wrong, but I can't run the proprietary software on my Mac, and SQL makes me comfortable. I hope to pull out the data I want via SQL (a processed 1% of the total), then run statistical analyses and graphics with R. I am describing all this in hopes there is another quantitative sociologist out there using SQLITE! TIA ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file
On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote: > I am sure there is a better way to deal with 12K rows by 2500 columns, > but I can't figure it out 2500 columns sounds like a nightmare to deal with could you perhaps explain that data layout a little? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file
Hi All, What are the ramifications of increasing SQLITE_MAX_VARIABLE_NUMBER, probably to ? I am trying to import a csv file from the National Longitudinal Study of Youth 79, and .import errors out, though creating the table worked ok. I am sure there is a better way to deal with 12K rows by 2500 columns, but I can't figure it out The special application for working with NLSY only runs on Windows ! So I am trying to process SAS files until I get to work next week. TIA ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users