Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 7:11 PM, Richard Hipp  wrote:
> On Tue, Nov 8, 2011 at 7:08 PM, Gabor Grothendieck
> wrote:
>
>> On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp  wrote:
>> > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
>> > wrote:
>> >
>> >> In R, the RSQLite driver for SQLite currently has
>> >> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> >> for many different projects and on different platforms and it seems
>> >> that a number of these projects want a larger number.  Users don't
>> >> compile this themselves so they are stuck with whatever number is
>> >> compiled in for them.
>> >>
>> >> What are the considerations for setting this number?
>> >>
>> >> Is there any general advice that can be given on how to set this number?
>> >>
>> >
>> > I just checked and it appears that MacOS Lion compiles it a 50.
>> >
>> > Whenever you use a value like ?N, SQLite allocates an array of N objects,
>> > each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
>> > 36MB memory allocation (with its accompanying memset()).  That's really
>> the
>> > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.
>>
>> If "SELECT ?50" allocates 50 * 72 bytes of memory then how
>> does that relate to SQLITE_MAX_VARIABLE_NUMBER?
>> SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
>> all.
>>
>
> SQLITE_MAX_VARIABLE_NUMBER determines the largest N for which ?N will
> work.  So by default, the maximum allocation is 999*72.  You can increase
> this to whatever you are comfortable with.
>

Thanks.  So there is really no downside to making it 10,000, say?
Those who want it that large will be able to have that many columns
and those who don't need that many won't incur any penalties.  Is that
right?


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Simon Slavin

On 9 Nov 2011, at 12:08am, Gabor Grothendieck wrote:

> If "SELECT ?50" allocates 50 * 72 bytes of memory then how
> does that relate to SQLITE_MAX_VARIABLE_NUMBER?
> SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
> all.

In the above case, SQLite was compiled with SQLITE_MAX_VARIABLE_NUMBER = 50.

It's a limit, rather than operational use.  Setting a lower value for 
SQLITE_MAX_VARIABLE_NUMBER means your user can't suddenly cause SQLite to 
demand 36 Meg of memory.  This can be extremely important for embedded systems 
where such a demand could crash the entire device.  It would be better to crash 
just the process which is trying to use SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Richard Hipp
On Tue, Nov 8, 2011 at 7:08 PM, Gabor Grothendieck
wrote:

> On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp  wrote:
> > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
> > wrote:
> >
> >> In R, the RSQLite driver for SQLite currently has
> >> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
> >> for many different projects and on different platforms and it seems
> >> that a number of these projects want a larger number.  Users don't
> >> compile this themselves so they are stuck with whatever number is
> >> compiled in for them.
> >>
> >> What are the considerations for setting this number?
> >>
> >> Is there any general advice that can be given on how to set this number?
> >>
> >
> > I just checked and it appears that MacOS Lion compiles it a 50.
> >
> > Whenever you use a value like ?N, SQLite allocates an array of N objects,
> > each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
> > 36MB memory allocation (with its accompanying memset()).  That's really
> the
> > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.
>
> If "SELECT ?50" allocates 50 * 72 bytes of memory then how
> does that relate to SQLITE_MAX_VARIABLE_NUMBER?
> SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
> all.
>

SQLITE_MAX_VARIABLE_NUMBER determines the largest N for which ?N will
work.  So by default, the maximum allocation is 999*72.  You can increase
this to whatever you are comfortable with.


>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp  wrote:
> On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
> wrote:
>
>> In R, the RSQLite driver for SQLite currently has
>> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> for many different projects and on different platforms and it seems
>> that a number of these projects want a larger number.  Users don't
>> compile this themselves so they are stuck with whatever number is
>> compiled in for them.
>>
>> What are the considerations for setting this number?
>>
>> Is there any general advice that can be given on how to set this number?
>>
>
> I just checked and it appears that MacOS Lion compiles it a 50.
>
> Whenever you use a value like ?N, SQLite allocates an array of N objects,
> each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
> 36MB memory allocation (with its accompanying memset()).  That's really the
> only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.

If "SELECT ?50" allocates 50 * 72 bytes of memory then how
does that relate to SQLITE_MAX_VARIABLE_NUMBER?
SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
all.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Richard Hipp
On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
wrote:

> In R, the RSQLite driver for SQLite currently has
> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
> for many different projects and on different platforms and it seems
> that a number of these projects want a larger number.  Users don't
> compile this themselves so they are stuck with whatever number is
> compiled in for them.
>
> What are the considerations for setting this number?
>
> Is there any general advice that can be given on how to set this number?
>

I just checked and it appears that MacOS Lion compiles it a 50.

Whenever you use a value like ?N, SQLite allocates an array of N objects,
each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
36MB memory allocation (with its accompanying memset()).  That's really the
only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.


>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 5:55 PM, Simon Slavin  wrote:
>
> On 8 Nov 2011, at 10:50pm, Gabor Grothendieck wrote:
>
>> In R, the RSQLite driver for SQLite currently has
>> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> for many different projects and on different platforms and it seems
>> that a number of these projects want a larger number.
>
> What sort of statements are R users doing which might require binding a 
> thousand variables to one statement ?  I can't think of any situation like 
> this that doesn't indicate an insane schema which should be normalised.
>

Anyone who uses R could be using this so we don't really know.  We do
know that there seems to be multiple requests for increasing the
limit.

R is used for analyzing data and when when one is doing that one does
not always generate the data oneself but receives it from an external
source.  This may include files which may be too large to read into R
or might fit in but are too slow to read into R.  They might be read
into a database and then a portion read into R from the database.

One could imagine the rows might represent individuals and the columns
might represent a large number of genes.  Or perhaps each row is an
individual and each column is a health marker.  Or each row is a time
point and each column is a security.

Typically such users must use a different database but would have
preferred to use SQLite hence the question of what are the
considerations of coming up with a single SQLITE_MAX_VARIABLE_NUMBER
that everyone is stuck with.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Simon Slavin

On 8 Nov 2011, at 10:50pm, Gabor Grothendieck wrote:

> In R, the RSQLite driver for SQLite currently has
> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
> for many different projects and on different platforms and it seems
> that a number of these projects want a larger number.

What sort of statements are R users doing which might require binding a 
thousand variables to one statement ?  I can't think of any situation like this 
that doesn't indicate an insane schema which should be normalised.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
In R, the RSQLite driver for SQLite currently has
SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
for many different projects and on different platforms and it seems
that a number of these projects want a larger number.  Users don't
compile this themselves so they are stuck with whatever number is
compiled in for them.

What are the considerations for setting this number?

Is there any general advice that can be given on how to set this number?

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very widefile

2008-12-31 Thread Griggs, Donald
Regarding:
>> I am sure there is a better way to deal with 12K rows by 2500 
>> columns, but I can't figure it out

I wonder if you might want to use *sed* or *awk* or *perl* to preprocess
the data before import.

A "master" table could contain the unique person id, plus the fields
that you intend to index and that you are likely to filter upon most
often.  Other tables could exist for the remaining data, and could be
joined on the person id as needed.

This might:
   -- let you avoid a customized version of sqlite
   -- allow your most-used queries to run faster

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file

2008-12-31 Thread Webb Sprague
See below.

On Sun, Dec 28, 2008 at 11:56 PM, Chris Wedgwood  wrote:
> On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote:
>
>> I am sure there is a better way to deal with 12K rows by 2500 columns,
>> but I can't figure it out
>
> 2500 columns sounds like a nightmare to deal with
>
> could you perhaps explain that data layout a little?
>

It is a download of a huge longitudinal survey
(www.bls.gov/nls/nlsy79.htm) that has been converted out of the
proprietary format into SAS, and now I want to convert it into a
single SQLITE database per wave.  I will wind up connecting people by
ID across the waves to show patterns of moving etc...

For each wave/ table, each row describes contains integers that code
for information about a single respondent, such as age, whether
employed in June  (either zero or one), whether employed in July,
etc...  Since the NLSY doesn't do multiple tables, this is very much
NOT normalized.  What the codes mean is described in a separate
codebook (-5 = missing data, 1=living at home, etc).

There is a separate table for each wave (1979, 1980, ... 2006).

I have managed (just now) to get it working with a hacked version of
SQLITE.  Here is a meaningless query, just to confirm:

sqlite> select W0072400, count(*) as c  from data_stuff group by
W0072400 order by c desc limit 5;
0,9204
-5,2513
100,293
1,80
3,43
CPU Time: user 0.917062 sys 0.364962

Like I say, I may be going about it all wrong, but I can't run the
proprietary software on my Mac, and SQL makes me comfortable.  I hope
to pull out the data I want via SQL (a processed 1% of the total),
then run statistical analyses and graphics with R.

I am describing all this in hopes there is another quantitative
sociologist out there using SQLITE!

TIA
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file

2008-12-29 Thread Webb Sprague
>> I am sure there is a better way to deal with 12K rows by 2500 columns,
>> but I can't figure it out
>
> 2500 columns sounds like a nightmare to deal with
>
> could you perhaps explain that data layout a little?

It is a download of a huge longitudinal survey
(www.bls.gov/nls/nlsy79.htm) that has been converted out of the
proprietary format into SAS, and now I want to convert it into a
single SQLITE database per wave.  I will wind up connecting people by
ID across the waves to show patterns of moving etc...

For each wave/ table, each row describes contains integers that code
for information about a single respondent, such as age, whether
employed in June  (either zero or one), whether employed in July,
etc...  Since the NLSY doesn't do multiple tables, this is very much
NOT normalized.  What the codes mean is described in a separate
codebook (-5 = missing data, 1=living at home, etc).

There is a separate table for each wave (1979, 1980, ... 2006).

I have managed (just now) to get it working with a hacked version of
SQLITE.  Here is a meaningless query, just to confirm:

sqlite> select W0072400, count(*) as c  from data_stuff group by
W0072400 order by c desc limit 5;
0,9204
-5,2513
100,293
1,80
3,43
CPU Time: user 0.917062 sys 0.364962

Like I say, I may be going about it all wrong, but I can't run the
proprietary software on my Mac, and SQL makes me comfortable.  I hope
to pull out the data I want via SQL (a processed 1% of the total),
then run statistical analyses and graphics with R.

I am describing all this in hopes there is another quantitative
sociologist out there using SQLITE!

TIA
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file

2008-12-29 Thread Chris Wedgwood
On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote:

> I am sure there is a better way to deal with 12K rows by 2500 columns,
> but I can't figure it out

2500 columns sounds like a nightmare to deal with

could you perhaps explain that data layout a little?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file

2008-12-28 Thread Webb Sprague
Hi All,

What are the ramifications of increasing SQLITE_MAX_VARIABLE_NUMBER,
probably to ?  I am trying to import a csv file from the National
Longitudinal Study of Youth 79, and .import errors out, though
creating the table worked ok.

I am sure there is a better way to deal with 12K rows by 2500 columns,
but I can't figure it out  The special application for working
with NLSY only runs on Windows !  So I am trying to process SAS files
until I get to work next week.

TIA
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users