Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Gussimulator
I don't really think this is a good way of dealing with this but thanks for 
the idea, if everything else doesnt work out, I might end up doing this. 
However its not a solution at all.




- Original Message - 
From: "Mikey C" <[EMAIL PROTECTED]>

To: <sqlite-users@sqlite.org>
Sent: Wednesday, July 05, 2006 6:44 PM
Subject: Re: [sqlite] Compressing the DBs?




What platform are you using?

If you are using NTFS filesystem you can just mark the file for 
compression

and the OS takes care of it transparently.


--
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175

Sent from the SQLite forum at Nabble.com.





Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Christian Smith

Gussimulator uttered:

I've been using SQLite for a very short period of time and so far Its 
doing a great job for my application (single user, quite a big amount of 
data though).


Now, since theres a lot of repetitive data, I thought that compressing 
the database would be a good idea, since, we all know.. One of the first 
principles of data compression is getting rid of repetitive data, so... 
I was wondering if this is possible with SQLite or it would be quite a 
pain to implement a compression scheme by myself?.. I have worked with 
many compression libraries before so that wouldnt be an issue, the issue 
however, would be to implement any of the libraries into SQLite...


So, before I waste my time with something that perhaps has been done 
already, I decided to ask here. Is there any compression scheme I can 
use? or I'm doomed to implement this by myself? (any tips?).





For desktop or server use, storage is cheap. Unless you're constrained by 
limited space because of embedded FLASH storage, you're better off just 
taking the hit, IMO.


Christian

--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \


Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Christian Smith

[EMAIL PROTECTED] uttered:


Mikey C <[EMAIL PROTECTED]> wrote:

Not sure what you mean there DRH, but I set compression on one of my database
files on NTFS and file size shrunk from  1,289,216 bytes to 696,320 bytes.

And of course the whole compression / decompression process is completely
transparent to SQLite and if you decide that compression is a bad thing, you
just uncheck the box on that file and you are back to where you started.



After turning compression on, try making lots of updates to
the database.  Does the database stay the same size?  Is
there a significant I/O performance hit?  I'm guessing that
the answer in both cases will be "yes".  Please let me know.



Compression on NTFS and co is done at a cluster group level. If the 
cluster group does not compress, it is stored as is. I think NTFS works 
with 16 cluster groups, which would be 64k chunks I think.


My guess is that bigger page sizes will benefit compressing filesystems, 
as similar keys will be close to each other. Match the page size to the 
group size, so 64k in the case of NTFS.


The performance hit should be negligable if any, especially given modern 
processors' vast performance advantage over disk IO. As has been said, the 
amount of data being read/written should be lower, so performance may 
marginally improve. But seek latency should be similar in both cases, so 
performance is probably largely the same. On a full/fragmented filesystem, 
writing less data may also reduce the number of seeks required. But a full 
and fragmented filesystem will have other performance issues anyway.




--
D. Richard Hipp   <[EMAIL PROTECTED]>




Christian

--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \


Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Jay Sprenkle

On 7/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


I'm guessing that the random I/O behavior of SQLite will
frustrate the compression implemented by NTFS and Reiser.
The end result will be a database file that might actually
take up more space than if it were uncompressed.

I'd be interested in hearing about experiments to prove
or disprove this conjecture.


Sounds like it would take a bit of work to ensure you're measuring
the right thing. I'm not sure how you'd turn off the caching of the
database in RAM.


Re: [sqlite] Compressing the DBs?

2006-07-06 Thread John Stanton
If you have repetition like that it doesn't mean that you cannot 
normalize into third normal form.  In your simple case having a table 
where X="E:\DirectX90\Feb2006" would work.  It would perform better than 
brute force compression requiring inflation for each search.


Gussimulator wrote:

Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the 
time in this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal with, 
they are normally paths. Since theres directories within directories, 
the paths repeat.


What would be an ideal aproach for this situation?, I would like to save 
space, but I wouldnt like to waste a big amount of processing power to 
do so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for this 
reason I cant have a very elaborated database scheme. Sometimes saving a 
few KBs could mean wasting a few tons of cycles, and I can't deal with 
that. I'd rather have those extra KBs and deal with a responsive 
application, than saving a few KBs and falling asleep at the keyboard 
(don't worry, it's a multi-threaded environment, however it's important 
to keep it optimized, I'm just over-sizing the problem a little).



I'd like to take the right 'path' here...
Thanks.






- Original Message - From: "Darren Duncan" 
<[EMAIL PROTECTED]>

To: <sqlite-users@sqlite.org>
Sent: Thursday, July 06, 2006 12:04 AM
Subject: Re: [sqlite] Compressing the DBs?



At 6:04 PM -0300 7/5/06, Gussimulator wrote:

Now, since theres a lot of repetitive data, I thought that 
compressing the database would be a good idea, since, we all know.. 
One of the first principles of data compression is getting rid of 
repetitive data, so... I was wondering if this is possible with 
SQLite or it would be quite a pain to implement a compression scheme 
by myself?.. I have worked with many compression libraries before so 
that wouldnt be an issue, the issue however, would be to implement 
any of the libraries into SQLite...



First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column 
and reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try 
splitting the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be 
the only step you need.


If you can't do that, then please explain in what way the data is 
repetitive?


-- Darren Duncan 







Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Mikey C

Not sure what you mean there DRH, but I set compression on one of my database
files on NTFS and file size shrunk from  1,289,216 bytes to 696,320 bytes.

And of course the whole compression / decompression process is completely
transparent to SQLite and if you decide that compression is a bad thing, you
just uncheck the box on that file and you are back to where you started.


-- 
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5199615
Sent from the SQLite forum at Nabble.com.



Re: [sqlite] Compressing the DBs?

2006-07-06 Thread drh
"Jay Sprenkle" <[EMAIL PROTECTED]> wrote:
> On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote:
> >
> > What platform are you using?
> >
> > If you are using NTFS filesystem you can just mark the file for compression
> > and the OS takes care of it transparently.
> 
> The linux OS supports Reiser file systems. This will compress
> on the fly with the compression being faster than the transfer to disk.
> Thus it's actually faster than a non compressing file system.
> 

I'm guessing that the random I/O behavior of SQLite will
frustrate the compression implemented by NTFS and Reiser.
The end result will be a database file that might actually
take up more space than if it were uncompressed.

I'd be interested in hearing about experiments to prove
or disprove this conjecture.
--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Jay Sprenkle

On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote:


What platform are you using?

If you are using NTFS filesystem you can just mark the file for compression
and the OS takes care of it transparently.


The linux OS supports Reiser file systems. This will compress
on the fly with the compression being faster than the transfer to disk.
Thus it's actually faster than a non compressing file system.


Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Eduardo

At 07:07 06/07/2006, you wrote:

Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the 
time in this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal 
with, they are normally paths. Since theres directories within 
directories, the paths repeat.


What would be an ideal aproach for this situation?, I would like to 
save space, but I wouldnt like to waste a big amount of processing 
power to do so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for 
this reason I cant have a very elaborated database scheme. Sometimes 
saving a few KBs could mean wasting a few tons of cycles, and I 
can't deal with that. I'd rather have those extra KBs and deal with 
a responsive application, than saving a few KBs and falling asleep 
at the keyboard (don't worry, it's a multi-threaded environment, 
however it's important to keep it optimized, I'm just over-sizing 
the problem a little).



I'd like to take the right 'path' here...
Thanks.


 SQLite has no compression system for free. Also, any compression 
must be done on page level, not data level, because most compression 
algorithms uses past data statics (statistical data of past data) for 
compression and if you try to it on data level, when a row is 
eliminated all rows after it becomes garbage.


There are a lot of compression algorithms, but i think the best for 
this is an arith or range coder with order 0 or 1, the page size is 
too low for greater orders or lz algorithms. Both (arith and range) 
are pretty fast, no fpu code (only integer, for embedded devices) and 
i think it will not slow too much file i/o.  On text only data you 
can expect 2.5 bpb or near 65% of size reduction, more when page size 
is greater.


The code is about 10 Kb, but i don't know where "plug-it-in" ;)

HTH



Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Bill KING
Gussimulator wrote:
> Hello there,
>
> This is what I mean by repetitive data:
>
> Tables:
> E:\DirectX90c\
> E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
> E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
> E:\DirectX90c\Feb2006_xact_x64.cab\
> E:\DirectX90c\Feb2006_MDX1_x86.cab\
> E:\DirectX90c\Feb2006_xact_x86.cab\
>
> And so on, As you can see, the string E:\DirectX90c\ repeats all the
> time in this example. (Also does "Feb2006_" on almost every table).
>
> It's just an example of the type of repetitive data I have to deal
> with, they are normally paths. Since theres directories within
> directories, the paths repeat.
>
> What would be an ideal aproach for this situation?, I would like to
> save space, but I wouldnt like to waste a big amount of processing
> power to do so.
>
> One must keep in mind that my system must perform "well" on various
> situations (which I cant predict, at least not all of them), for this
> reason I cant have a very elaborated database scheme. Sometimes saving
> a few KBs could mean wasting a few tons of cycles, and I can't deal
> with that. I'd rather have those extra KBs and deal with a responsive
> application, than saving a few KBs and falling asleep at the keyboard
> (don't worry, it's a multi-threaded environment, however it's
> important to keep it optimized, I'm just over-sizing the problem a
> little).
>
>
> I'd like to take the right 'path' here...
> Thanks.
>
>
>
>
>
>
> - Original Message - From: "Darren Duncan"
> <[EMAIL PROTECTED]>
> To: <sqlite-users@sqlite.org>
> Sent: Thursday, July 06, 2006 12:04 AM
> Subject: Re: [sqlite] Compressing the DBs?
>
>
>> At 6:04 PM -0300 7/5/06, Gussimulator wrote:
>>> Now, since theres a lot of repetitive data, I thought that
>>> compressing the database would be a good idea, since, we all know..
>>> One of the first principles of data compression is getting rid of
>>> repetitive data, so... I was wondering if this is possible with
>>> SQLite or it would be quite a pain to implement a compression scheme
>>> by myself?.. I have worked with many compression libraries before so
>>> that wouldnt be an issue, the issue however, would be to implement
>>> any of the libraries into SQLite...
>>
>> First things first, what do you mean by "repetitive"?
>>
>> Do you mean that there are many copies of the same data?
>>
>> Perhaps a better approach is to normalize the database and just store
>> single copies of things.
>>
>> If you have tables with duplicate rows, then add a 'quantity' column
>> and reduce to one copy of the actual data.
>>
>> If some columns are unique and some are repeated, perhaps try
>> splitting the tables into more tables that are related.
>>
>> This, really, is what you should be doing first, and may very well be
>> the only step you need.
>>
>> If you can't do that, then please explain in what way the data is
>> repetitive?
>>
>> -- Darren Duncan 
>
>
We came across this with our filesystem metainfo system, what we ended
up doing was creating a sub-table called "location". with this location
it can be used either via join in the sql statement, or via a cached
internal structure to recreate the path of a file. (Easy enough with a
map<> or a hash<> style bucket class). No major overhead costs, but a
definate savings in space. (Each directory becomes an entry in the
location table, so a file is then stored as a location key value +
filename).

-- 
Bill King, Software Engineer
Trolltech, Brisbane Technology Park
26 Brandl St, Eight Mile Plains, 
QLD, Australia, 4113
Tel + 61 7 3219 9906 (x137)
Fax + 61 7 3219 9938
mobile: 0423 532 733



Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Gussimulator

Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the time in 
this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal with, 
they are normally paths. Since theres directories within directories, the 
paths repeat.


What would be an ideal aproach for this situation?, I would like to save 
space, but I wouldnt like to waste a big amount of processing power to do 
so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for this reason 
I cant have a very elaborated database scheme. Sometimes saving a few KBs 
could mean wasting a few tons of cycles, and I can't deal with that. I'd 
rather have those extra KBs and deal with a responsive application, than 
saving a few KBs and falling asleep at the keyboard (don't worry, it's a 
multi-threaded environment, however it's important to keep it optimized, I'm 
just over-sizing the problem a little).



I'd like to take the right 'path' here...
Thanks.






- Original Message - 
From: "Darren Duncan" <[EMAIL PROTECTED]>

To: <sqlite-users@sqlite.org>
Sent: Thursday, July 06, 2006 12:04 AM
Subject: Re: [sqlite] Compressing the DBs?



At 6:04 PM -0300 7/5/06, Gussimulator wrote:
Now, since theres a lot of repetitive data, I thought that compressing the 
database would be a good idea, since, we all know.. One of the first 
principles of data compression is getting rid of repetitive data, so... I 
was wondering if this is possible with SQLite or it would be quite a pain 
to implement a compression scheme by myself?.. I have worked with many 
compression libraries before so that wouldnt be an issue, the issue 
however, would be to implement any of the libraries into SQLite...


First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column and 
reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try splitting 
the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be the 
only step you need.


If you can't do that, then please explain in what way the data is 
repetitive?


-- Darren Duncan 




Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Darren Duncan

At 6:04 PM -0300 7/5/06, Gussimulator wrote:
Now, since theres a lot of repetitive data, I thought that 
compressing the database would be a good idea, since, we all know.. 
One of the first principles of data compression is getting rid of 
repetitive data, so... I was wondering if this is possible with 
SQLite or it would be quite a pain to implement a compression scheme 
by myself?.. I have worked with many compression libraries before so 
that wouldnt be an issue, the issue however, would be to implement 
any of the libraries into SQLite...


First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column 
and reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try 
splitting the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be 
the only step you need.


If you can't do that, then please explain in what way the data is repetitive?

-- Darren Duncan


Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Mikey C

What platform are you using?

If you are using NTFS filesystem you can just mark the file for compression
and the OS takes care of it transparently.


-- 
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175
Sent from the SQLite forum at Nabble.com.



[sqlite] Compressing the DBs?

2006-07-05 Thread Gussimulator
I've been using SQLite for a very short period of time and so far Its doing a 
great job for my application (single user, quite a big amount of data though).

Now, since theres a lot of repetitive data, I thought that compressing the 
database would be a good idea, since, we all know.. One of the first principles 
of data compression is getting rid of repetitive data, so... I was wondering if 
this is possible with SQLite or it would be quite a pain to implement a 
compression scheme by myself?.. I have worked with many compression libraries 
before so that wouldnt be an issue, the issue however, would be to implement 
any of the libraries into SQLite...

So, before I waste my time with something that perhaps has been done already, I 
decided to ask here. Is there any compression scheme I can use? or I'm doomed 
to implement this by myself? (any tips?).

Thanks a lot. 

PS: Sorry if I didnt read the manual enough.