AW: [sqlite] Index size in file

2007-10-04 Thread Michael Ruck
If you're running under constraints so low, you should take care choosing
the right
tools for the job. Apparently sqlite isn't the right tool for this job.

Mike

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 
Gesendet: Freitag, 5. Oktober 2007 00:19
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Index size in file




Let's assume that my whole database can be in the cache. If my indexes have
duplicate data, then I will either need a bigger cache or have to page out
row
data in favour of index data.
In that case it will either be slower or require more memory to keep
duplicate
data for the indexes as opposed to referencing the original data.

Clive





John Stanton <[EMAIL PROTECTED]> on 05/10/2007 00:54:21

Please respond to sqlite-users@sqlite.org

To:   sqlite-users@sqlite.org
cc:(bcc: clive/Emultek)

Subject:  Re: [sqlite] Index size in file



Trevor Talbot wrote:
> On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote:
>
>>A B-Tree index holds keys in sorted sequence.  They are in random
>>sequence in the database.  That requires holding the keys in the B-Tree
>>nodes.
>
>
> Actually, it doesn't strictly require that; it could store references
> to the keys.  An obvious tradeoff is I/O; an index walk is less useful
> if you have to do random seeks to the locations of row data just to
> get the keys to walk the tree in the first place.  IOW in simplistic
> terms, an index walk suddenly doubles in disk I/O.
>
> The information on SQL Server would be interesting, as I know it
> stores sort keys under some conditions, which is effectively duplicate
> data.
>
One would need to be a paleontologist to measure the performance of an
ordered index with indirect key references.


-
To unsubscribe, send email to [EMAIL PROTECTED]

-









This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer
viruses.














-
To unsubscribe, send email to [EMAIL PROTECTED]

-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread John Stanton

Kees Nuyt wrote:

On Wed, 3 Oct 2007 15:39:06 +0200, you wrote:





I created an index on a TEXT column as I want to be able to
I noticed a large increase in the file size.
Looking at the binary of the file, I see that the index has a copy of all the
data being indexed.
1. Is this necassary?
2. Is there a way to keep the index only in memory and not in the file.

Clive



A long time ago I saw index systems (don't remember, perhaps a
mainframe with indexed sequential files), where the B-Tree used
simple 'key compression'. Some encoding scheme which replaces
the key field by a structure (repeat_length, keypart).

Best shown in an example (my keys are 'frank', 'franklin',
'fred', 'google', 'gopher'):

- store the full key 'frank' of the first entry in the page as
   (0,frank)
- store 'franklin' as (5,lin), meaning: take the first five
characters of the previous key and concatenate the rest.
- store 'fred' as (2,ed)
- store 'google' as (0,google)
- store 'gopher' as (2,pher)

This works nicely for large indexes with long keys and a lot of
repetition. Of course the effort to handle insertions and
deletions is significant.


A simpler algorithm is "prefix b-trees" where only the significant part 
of the keys is held in the interior and leaf nodes.  It can reduce the 
size of the index.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread Clive . Bluston



Let's assume that my whole database can be in the cache. If my indexes have
duplicate data, then I will either need a bigger cache or have to page out row
data in favour of index data.
In that case it will either be slower or require more memory to keep duplicate
data for the indexes as opposed to referencing the original data.

Clive





John Stanton <[EMAIL PROTECTED]> on 05/10/2007 00:54:21

Please respond to sqlite-users@sqlite.org

To:   sqlite-users@sqlite.org
cc:(bcc: clive/Emultek)

Subject:  Re: [sqlite] Index size in file



Trevor Talbot wrote:
> On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote:
>
>>A B-Tree index holds keys in sorted sequence.  They are in random
>>sequence in the database.  That requires holding the keys in the B-Tree
>>nodes.
>
>
> Actually, it doesn't strictly require that; it could store references
> to the keys.  An obvious tradeoff is I/O; an index walk is less useful
> if you have to do random seeks to the locations of row data just to
> get the keys to walk the tree in the first place.  IOW in simplistic
> terms, an index walk suddenly doubles in disk I/O.
>
> The information on SQL Server would be interesting, as I know it
> stores sort keys under some conditions, which is effectively duplicate
> data.
>
One would need to be a paleontologist to measure the performance of an
ordered index with indirect key references.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-








This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer
viruses.












-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread Kees Nuyt
On Wed, 3 Oct 2007 15:39:06 +0200, you wrote:

>
>
>
>I created an index on a TEXT column as I want to be able to
>I noticed a large increase in the file size.
>Looking at the binary of the file, I see that the index has a copy of all the
>data being indexed.
>1. Is this necassary?
>2. Is there a way to keep the index only in memory and not in the file.
>
>Clive

A long time ago I saw index systems (don't remember, perhaps a
mainframe with indexed sequential files), where the B-Tree used
simple 'key compression'. Some encoding scheme which replaces
the key field by a structure (repeat_length, keypart).

Best shown in an example (my keys are 'frank', 'franklin',
'fred', 'google', 'gopher'):

- store the full key 'frank' of the first entry in the page as
   (0,frank)
- store 'franklin' as (5,lin), meaning: take the first five
characters of the previous key and concatenate the rest.
- store 'fred' as (2,ed)
- store 'google' as (0,google)
- store 'gopher' as (2,pher)

This works nicely for large indexes with long keys and a lot of
repetition. Of course the effort to handle insertions and
deletions is significant.
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread John Stanton

Trevor Talbot wrote:

On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote:


A B-Tree index holds keys in sorted sequence.  They are in random
sequence in the database.  That requires holding the keys in the B-Tree
nodes.



Actually, it doesn't strictly require that; it could store references
to the keys.  An obvious tradeoff is I/O; an index walk is less useful
if you have to do random seeks to the locations of row data just to
get the keys to walk the tree in the first place.  IOW in simplistic
terms, an index walk suddenly doubles in disk I/O.

The information on SQL Server would be interesting, as I know it
stores sort keys under some conditions, which is effectively duplicate
data.

One would need to be a paleontologist to measure the performance of an 
ordered index with indirect key references.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread Grzegorz Makarewicz
[EMAIL PROTECTED] wrote:
> Regarding:
>   
>>> Looking at the binary of the file, I see that the index has a copy of
>>> all the data being indexed.
>>> 1. Is this necassary?
>>>   
>
> Unless you're programming for a cellphone or some other embedded gadget,
> you might want to calculate the cost (on the margin) of the disk storage
> for the estimated amount of duplicated data.
>
> You might find it's vastly less than the cost of your time to think
> about it.   YMMV.
>   

I'm as other developpong for smart/pocket/- devices but sqlite is very
rich for this platform
- too rich :)

mak


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Index size in file

2007-10-04 Thread Clive . Bluston



Actually yes, I am programming for a cellphone  and you are right, that is the
only reason I am thinking about it!

Clive





"Griggs, Donald" <[EMAIL PROTECTED]> on 04/10/2007 21:23:17

Please respond to sqlite-users@sqlite.org

To:   sqlite-users@sqlite.org
cc:(bcc: clive/Emultek)

Subject:  RE: [sqlite] Index size in file



Regarding:
>>Looking at the binary of the file, I see that the index has a copy of
>>all the data being indexed.
>>1. Is this necassary?

Unless you're programming for a cellphone or some other embedded gadget,
you might want to calculate the cost (on the margin) of the disk storage
for the estimated amount of duplicated data.

You might find it's vastly less than the cost of your time to think
about it.   YMMV.




This message has been scanned for viruses by MailControl - www.mailcontrol.com

-
To unsubscribe, send email to [EMAIL PROTECTED]
-








This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer
viruses.












-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread Clive . Bluston



>From what I read SQL Server has 2 basic types of index:
1. Clustered, that holds the single instance of the data itself (actaully the
whole row)
2. Non-clustered that hold a pointer to the single instance of the data, but not
the data itself.

Clive






John Stanton <[EMAIL PROTECTED]> on 04/10/2007 20:02:16

Please respond to sqlite-users@sqlite.org

To:   sqlite-users@sqlite.org
cc:(bcc: clive/Emultek)

Subject:  Re: [sqlite] Index size in file



A B-Tree index holds keys in sorted sequence.  They are in random
sequence in the database.  That requires holding the keys in the B-Tree
nodes.

A hashed type access does not have that storage overhead, but it does
not deliver the rows in sorted sequence.

[EMAIL PROTECTED] wrote:
>
>
> I am not an expert on indexes, however it does seem strange to me that a
> database should keep duplicate data in it.
> This prompted me to look up how indexes are stored in other databases. To tell
> the truth I only looked at one, and that is SQL Server.
> They do not store any duplicate data. If you would like the reference I can
give
> it to you.
> I agree with you that it means an extra lookup that could make things slower,
(
> I say could because you use more space in the cache which could result in more
> reads)
> Anyway given that that is the way it is implemented, does anyone know if it is
> possible to create an index in memory?
>
> Clive
>
>
>
>
>
> John Stanton <[EMAIL PROTECTED]> on 03/10/2007 17:36:58
>
> Please respond to sqlite-users@sqlite.org
>
> To:   sqlite-users@sqlite.org
> cc:(bcc: clive/Emultek)
>
> Subject:  Re: [sqlite] Index size in file
>
>
>
> An index which does not hold keys is not an index.  If you don't want to
> allocate space for indexing then you put up with slow performance and
> use row searches.
>
> [EMAIL PROTECTED] wrote:
>
>>
>>I created an index on a TEXT column as I want to be able to
>>I noticed a large increase in the file size.
>>Looking at the binary of the file, I see that the index has a copy of all the
>>data being indexed.
>>1. Is this necassary?
>>2. Is there a way to keep the index only in memory and not in the file.
>>
>>Clive
>>
>>
>>
>>-
>>To unsubscribe, send email to [EMAIL PROTECTED]
>>-
>>
>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>
>
>
>
>
>


>
> This footnote confirms that this email message has been scanned by
> PineApp Mail-SeCure for the presence of malicious code, vandals & computer
> viruses.
>


>
>
>
>
>
>
>
>
>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-








This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer
viruses.












-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-04 Thread Clive . Bluston



I am not an expert on indexes, however it does seem strange to me that a
database should keep duplicate data in it.
This prompted me to look up how indexes are stored in other databases. To tell
the truth I only looked at one, and that is SQL Server.
They do not store any duplicate data. If you would like the reference I can give
it to you.
I agree with you that it means an extra lookup that could make things slower, (
I say could because you use more space in the cache which could result in more
reads)
Anyway given that that is the way it is implemented, does anyone know if it is
possible to create an index in memory?

Clive





John Stanton <[EMAIL PROTECTED]> on 03/10/2007 17:36:58

Please respond to sqlite-users@sqlite.org

To:   sqlite-users@sqlite.org
cc:(bcc: clive/Emultek)

Subject:  Re: [sqlite] Index size in file



An index which does not hold keys is not an index.  If you don't want to
allocate space for indexing then you put up with slow performance and
use row searches.

[EMAIL PROTECTED] wrote:
>
>
> I created an index on a TEXT column as I want to be able to
> I noticed a large increase in the file size.
> Looking at the binary of the file, I see that the index has a copy of all the
> data being indexed.
> 1. Is this necassary?
> 2. Is there a way to keep the index only in memory and not in the file.
>
> Clive
>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-








This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer
viruses.












-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index size in file

2007-10-03 Thread John Stanton
An index which does not hold keys is not an index.  If you don't want to 
allocate space for indexing then you put up with slow performance and 
use row searches.


[EMAIL PROTECTED] wrote:



I created an index on a TEXT column as I want to be able to
I noticed a large increase in the file size.
Looking at the binary of the file, I see that the index has a copy of all the
data being indexed.
1. Is this necassary?
2. Is there a way to keep the index only in memory and not in the file.

Clive



-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Index size in file

2007-10-03 Thread Clive . Bluston



I created an index on a TEXT column as I want to be able to
I noticed a large increase in the file size.
Looking at the binary of the file, I see that the index has a copy of all the
data being indexed.
1. Is this necassary?
2. Is there a way to keep the index only in memory and not in the file.

Clive



-
To unsubscribe, send email to [EMAIL PROTECTED]
-