subject:"Re\: \[sqlite\] Compressing the DBs\?"

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Jay Sprenkle


On 7/6/06, Gussimulator <[EMAIL PROTECTED]> wrote:

Eduardo said:
>   SQLite has no compression system for free.

Not for free?, I take it that its possible to implement a compression scheme
in the database system then. Which is what I care only - of course its
possible, but now I know its been done - Thats all I needed to know, since,
if every possible solution fails, I might have to implement this by myself,
low-level.

If this "compression-enabled package" uses its own compression algorithm/s,
then I'm ok with it not being free.. On the other hand, if they use a well
known open source algorithm, charging for this version is a little off... At
least from my point of view - of course it depends mainly on licenses and
points of view - I guess thats the case.


You get paid for your time. Someone who adapts a compression algorithm to
sqlite, even an open source one, should be paid for their time too.
Seems fair to me.




About normalization, even if I had a good work-around, that doesnt mean that
using a compression scheme wouldnt perform better on most cases. (Although I
cant tell speed-wise.


If you really desire to save space optimization by normalization will
help a great
deal. So long as you do it correctly. I worked on one database where they
normalized the state names in street addresses. They added a 4 byte integer
key to the address table to reference the state, a matching 4 byte integer
in the state table to identify the state and created a btree index to join the
two tables with. The keys in the two tables alone were 4 times the size of
a 2 byte state name. This design choice resulted in more disk space being used,
slower response time, and more difficult programming. You can over normalize.


--
SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Gussimulator


Christian said:

For desktop or server use, storage is cheap.


Thats true, however I'd like to compress the database because I need 
portability.
Sure, one could compress the DB and move it into a mobile storaging unit, 
then when you're done you could decompress the DB on the destination 
harddrive and keep on working with it.


But that defeats the purpose mainly, the idea is to compress the database so 
it could be stored / used on said limited storage units.



Eduardo said:

  SQLite has no compression system for free.


Not for free?, I take it that its possible to implement a compression scheme 
in the database system then. Which is what I care only - of course its 
possible, but now I know its been done - Thats all I needed to know, since, 
if every possible solution fails, I might have to implement this by myself, 
low-level.




If this "compression-enabled package" uses its own compression algorithm/s, 
then I'm ok with it not being free.. On the other hand, if they use a well 
known open source algorithm, charging for this version is a little off... At 
least from my point of view - of course it depends mainly on licenses and 
points of view - I guess thats the case.


In anyway, this might not be as easy as one thinks for the first time. (this 
happens all the time when you're developing something, I take it that I'm 
not the only one who thinks so).



About normalization, even if I had a good work-around, that doesnt mean that 
using a compression scheme wouldnt perform better on most cases. (Although I 
cant tell speed-wise.. I don't know how SQLite works internally, never 
sniffed the sources I'm afraid, only a few routines I had to implement by 
myself on the software side - the string issues, remember - There would be a 
lot more memory and processor usage, thats for sure, although memory will 
depend on the algorithm, if its buffered then thats not a huge issue, on the 
other hand, the processor usage would go sky-high, again, depending on the 
algorithm we use, this might also be true for compression only, for example 
with JCalG1)




I'll see what to do on this weekend.
Thanks for all the feedback.

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Gussimulator

I don't really think this is a good way of dealing with this but thanks for 
the idea, if everything else doesnt work out, I might end up doing this. 
However its not a solution at all.




- Original Message - 
From: "Mikey C" <[EMAIL PROTECTED]>

To: 
Sent: Wednesday, July 05, 2006 6:44 PM
Subject: Re: [sqlite] Compressing the DBs?




What platform are you using?

If you are using NTFS filesystem you can just mark the file for 
compression

and the OS takes care of it transparently.


--
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175

Sent from the SQLite forum at Nabble.com.

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Christian Smith


Gussimulator uttered:

I've been using SQLite for a very short period of time and so far Its 
doing a great job for my application (single user, quite a big amount of 
data though).


Now, since theres a lot of repetitive data, I thought that compressing 
the database would be a good idea, since, we all know.. One of the first 
principles of data compression is getting rid of repetitive data, so... 
I was wondering if this is possible with SQLite or it would be quite a 
pain to implement a compression scheme by myself?.. I have worked with 
many compression libraries before so that wouldnt be an issue, the issue 
however, would be to implement any of the libraries into SQLite...


So, before I waste my time with something that perhaps has been done 
already, I decided to ask here. Is there any compression scheme I can 
use? or I'm doomed to implement this by myself? (any tips?).





For desktop or server use, storage is cheap. Unless you're constrained by 
limited space because of embedded FLASH storage, you're better off just 
taking the hit, IMO.


Christian

--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Christian Smith


[EMAIL PROTECTED] uttered:


Mikey C <[EMAIL PROTECTED]> wrote:

Not sure what you mean there DRH, but I set compression on one of my database
files on NTFS and file size shrunk from  1,289,216 bytes to 696,320 bytes.

And of course the whole compression / decompression process is completely
transparent to SQLite and if you decide that compression is a bad thing, you
just uncheck the box on that file and you are back to where you started.



After turning compression on, try making lots of updates to
the database.  Does the database stay the same size?  Is
there a significant I/O performance hit?  I'm guessing that
the answer in both cases will be "yes".  Please let me know.



Compression on NTFS and co is done at a cluster group level. If the 
cluster group does not compress, it is stored as is. I think NTFS works 
with 16 cluster groups, which would be 64k chunks I think.


My guess is that bigger page sizes will benefit compressing filesystems, 
as similar keys will be close to each other. Match the page size to the 
group size, so 64k in the case of NTFS.


The performance hit should be negligable if any, especially given modern 
processors' vast performance advantage over disk IO. As has been said, the 
amount of data being read/written should be lower, so performance may 
marginally improve. But seek latency should be similar in both cases, so 
performance is probably largely the same. On a full/fragmented filesystem, 
writing less data may also reduce the number of seeks required. But a full 
and fragmented filesystem will have other performance issues anyway.




--
D. Richard Hipp   <[EMAIL PROTECTED]>




Christian

--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Jay Sprenkle


On 7/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


I'm guessing that the random I/O behavior of SQLite will
frustrate the compression implemented by NTFS and Reiser.
The end result will be a database file that might actually
take up more space than if it were uncompressed.

I'd be interested in hearing about experiments to prove
or disprove this conjecture.


Sounds like it would take a bit of work to ensure you're measuring
the right thing. I'm not sure how you'd turn off the caching of the
database in RAM.

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread drh

Mikey C <[EMAIL PROTECTED]> wrote:
> Not sure what you mean there DRH, but I set compression on one of my database
> files on NTFS and file size shrunk from  1,289,216 bytes to 696,320 bytes.
> 
> And of course the whole compression / decompression process is completely
> transparent to SQLite and if you decide that compression is a bad thing, you
> just uncheck the box on that file and you are back to where you started.
> 

After turning compression on, try making lots of updates to
the database.  Does the database stay the same size?  Is
there a significant I/O performance hit?  I'm guessing that
the answer in both cases will be "yes".  Please let me know.
--
D. Richard Hipp   <[EMAIL PROTECTED]>

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread John Stanton

If you have repetition like that it doesn't mean that you cannot 
normalize into third normal form.  In your simple case having a table 
where X="E:\DirectX90\Feb2006" would work.  It would perform better than 
brute force compression requiring inflation for each search.


Gussimulator wrote:

Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the 
time in this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal with, 
they are normally paths. Since theres directories within directories, 
the paths repeat.


What would be an ideal aproach for this situation?, I would like to save 
space, but I wouldnt like to waste a big amount of processing power to 
do so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for this 
reason I cant have a very elaborated database scheme. Sometimes saving a 
few KBs could mean wasting a few tons of cycles, and I can't deal with 
that. I'd rather have those extra KBs and deal with a responsive 
application, than saving a few KBs and falling asleep at the keyboard 
(don't worry, it's a multi-threaded environment, however it's important 
to keep it optimized, I'm just over-sizing the problem a little).



I'd like to take the right 'path' here...
Thanks.






- Original Message - From: "Darren Duncan" 
<[EMAIL PROTECTED]>

To: 
Sent: Thursday, July 06, 2006 12:04 AM
Subject: Re: [sqlite] Compressing the DBs?



At 6:04 PM -0300 7/5/06, Gussimulator wrote:

Now, since theres a lot of repetitive data, I thought that 
compressing the database would be a good idea, since, we all know.. 
One of the first principles of data compression is getting rid of 
repetitive data, so... I was wondering if this is possible with 
SQLite or it would be quite a pain to implement a compression scheme 
by myself?.. I have worked with many compression libraries before so 
that wouldnt be an issue, the issue however, would be to implement 
any of the libraries into SQLite...



First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column 
and reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try 
splitting the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be 
the only step you need.


If you can't do that, then please explain in what way the data is 
repetitive?


-- Darren Duncan

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Mikey C


Not sure what you mean there DRH, but I set compression on one of my database
files on NTFS and file size shrunk from  1,289,216 bytes to 696,320 bytes.

And of course the whole compression / decompression process is completely
transparent to SQLite and if you decide that compression is a bad thing, you
just uncheck the box on that file and you are back to where you started.


-- 
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5199615
Sent from the SQLite forum at Nabble.com.

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread drh

"Jay Sprenkle" <[EMAIL PROTECTED]> wrote:
> On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote:
> >
> > What platform are you using?
> >
> > If you are using NTFS filesystem you can just mark the file for compression
> > and the OS takes care of it transparently.
> 
> The linux OS supports Reiser file systems. This will compress
> on the fly with the compression being faster than the transfer to disk.
> Thus it's actually faster than a non compressing file system.
> 

I'm guessing that the random I/O behavior of SQLite will
frustrate the compression implemented by NTFS and Reiser.
The end result will be a database file that might actually
take up more space than if it were uncompressed.

I'd be interested in hearing about experiments to prove
or disprove this conjecture.
--
D. Richard Hipp   <[EMAIL PROTECTED]>

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Jay Sprenkle


On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote:


What platform are you using?

If you are using NTFS filesystem you can just mark the file for compression
and the OS takes care of it transparently.


The linux OS supports Reiser file systems. This will compress
on the fly with the compression being faster than the transfer to disk.
Thus it's actually faster than a non compressing file system.

Re: [sqlite] Compressing the DBs?

2006-07-06 Thread Eduardo


At 07:07 06/07/2006, you wrote:

Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the 
time in this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal 
with, they are normally paths. Since theres directories within 
directories, the paths repeat.


What would be an ideal aproach for this situation?, I would like to 
save space, but I wouldnt like to waste a big amount of processing 
power to do so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for 
this reason I cant have a very elaborated database scheme. Sometimes 
saving a few KBs could mean wasting a few tons of cycles, and I 
can't deal with that. I'd rather have those extra KBs and deal with 
a responsive application, than saving a few KBs and falling asleep 
at the keyboard (don't worry, it's a multi-threaded environment, 
however it's important to keep it optimized, I'm just over-sizing 
the problem a little).



I'd like to take the right 'path' here...
Thanks.


 SQLite has no compression system for free. Also, any compression 
must be done on page level, not data level, because most compression 
algorithms uses past data statics (statistical data of past data) for 
compression and if you try to it on data level, when a row is 
eliminated all rows after it becomes garbage.


There are a lot of compression algorithms, but i think the best for 
this is an arith or range coder with order 0 or 1, the page size is 
too low for greater orders or lz algorithms. Both (arith and range) 
are pretty fast, no fpu code (only integer, for embedded devices) and 
i think it will not slow too much file i/o.  On text only data you 
can expect 2.5 bpb or near 65% of size reduction, more when page size 
is greater.


The code is about 10 Kb, but i don't know where "plug-it-in" ;)

HTH

Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Bill KING

Gussimulator wrote:
> Hello there,
>
> This is what I mean by repetitive data:
>
> Tables:
> E:\DirectX90c\
> E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
> E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
> E:\DirectX90c\Feb2006_xact_x64.cab\
> E:\DirectX90c\Feb2006_MDX1_x86.cab\
> E:\DirectX90c\Feb2006_xact_x86.cab\
>
> And so on, As you can see, the string E:\DirectX90c\ repeats all the
> time in this example. (Also does "Feb2006_" on almost every table).
>
> It's just an example of the type of repetitive data I have to deal
> with, they are normally paths. Since theres directories within
> directories, the paths repeat.
>
> What would be an ideal aproach for this situation?, I would like to
> save space, but I wouldnt like to waste a big amount of processing
> power to do so.
>
> One must keep in mind that my system must perform "well" on various
> situations (which I cant predict, at least not all of them), for this
> reason I cant have a very elaborated database scheme. Sometimes saving
> a few KBs could mean wasting a few tons of cycles, and I can't deal
> with that. I'd rather have those extra KBs and deal with a responsive
> application, than saving a few KBs and falling asleep at the keyboard
> (don't worry, it's a multi-threaded environment, however it's
> important to keep it optimized, I'm just over-sizing the problem a
> little).
>
>
> I'd like to take the right 'path' here...
> Thanks.
>
>
>
>
>
>
> - Original Message - From: "Darren Duncan"
> <[EMAIL PROTECTED]>
> To: 
> Sent: Thursday, July 06, 2006 12:04 AM
> Subject: Re: [sqlite] Compressing the DBs?
>
>
>> At 6:04 PM -0300 7/5/06, Gussimulator wrote:
>>> Now, since theres a lot of repetitive data, I thought that
>>> compressing the database would be a good idea, since, we all know..
>>> One of the first principles of data compression is getting rid of
>>> repetitive data, so... I was wondering if this is possible with
>>> SQLite or it would be quite a pain to implement a compression scheme
>>> by myself?.. I have worked with many compression libraries before so
>>> that wouldnt be an issue, the issue however, would be to implement
>>> any of the libraries into SQLite...
>>
>> First things first, what do you mean by "repetitive"?
>>
>> Do you mean that there are many copies of the same data?
>>
>> Perhaps a better approach is to normalize the database and just store
>> single copies of things.
>>
>> If you have tables with duplicate rows, then add a 'quantity' column
>> and reduce to one copy of the actual data.
>>
>> If some columns are unique and some are repeated, perhaps try
>> splitting the tables into more tables that are related.
>>
>> This, really, is what you should be doing first, and may very well be
>> the only step you need.
>>
>> If you can't do that, then please explain in what way the data is
>> repetitive?
>>
>> -- Darren Duncan 
>
>
We came across this with our filesystem metainfo system, what we ended
up doing was creating a sub-table called "location". with this location
it can be used either via join in the sql statement, or via a cached
internal structure to recreate the path of a file. (Easy enough with a
map<> or a hash<> style bucket class). No major overhead costs, but a
definate savings in space. (Each directory becomes an entry in the
location table, so a file is then stored as a location key value +
filename).

-- 
Bill King, Software Engineer
Trolltech, Brisbane Technology Park
26 Brandl St, Eight Mile Plains, 
QLD, Australia, 4113
Tel + 61 7 3219 9906 (x137)
Fax + 61 7 3219 9938
mobile: 0423 532 733

Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Gussimulator


Hello there,

This is what I mean by repetitive data:

Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\

And so on, As you can see, the string E:\DirectX90c\ repeats all the time in 
this example. (Also does "Feb2006_" on almost every table).


It's just an example of the type of repetitive data I have to deal with, 
they are normally paths. Since theres directories within directories, the 
paths repeat.


What would be an ideal aproach for this situation?, I would like to save 
space, but I wouldnt like to waste a big amount of processing power to do 
so.


One must keep in mind that my system must perform "well" on various 
situations (which I cant predict, at least not all of them), for this reason 
I cant have a very elaborated database scheme. Sometimes saving a few KBs 
could mean wasting a few tons of cycles, and I can't deal with that. I'd 
rather have those extra KBs and deal with a responsive application, than 
saving a few KBs and falling asleep at the keyboard (don't worry, it's a 
multi-threaded environment, however it's important to keep it optimized, I'm 
just over-sizing the problem a little).



I'd like to take the right 'path' here...
Thanks.






- Original Message - 
From: "Darren Duncan" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, July 06, 2006 12:04 AM
Subject: Re: [sqlite] Compressing the DBs?



At 6:04 PM -0300 7/5/06, Gussimulator wrote:
Now, since theres a lot of repetitive data, I thought that compressing the 
database would be a good idea, since, we all know.. One of the first 
principles of data compression is getting rid of repetitive data, so... I 
was wondering if this is possible with SQLite or it would be quite a pain 
to implement a compression scheme by myself?.. I have worked with many 
compression libraries before so that wouldnt be an issue, the issue 
however, would be to implement any of the libraries into SQLite...


First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column and 
reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try splitting 
the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be the 
only step you need.


If you can't do that, then please explain in what way the data is 
repetitive?


-- Darren Duncan

Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Darren Duncan


At 6:04 PM -0300 7/5/06, Gussimulator wrote:
Now, since theres a lot of repetitive data, I thought that 
compressing the database would be a good idea, since, we all know.. 
One of the first principles of data compression is getting rid of 
repetitive data, so... I was wondering if this is possible with 
SQLite or it would be quite a pain to implement a compression scheme 
by myself?.. I have worked with many compression libraries before so 
that wouldnt be an issue, the issue however, would be to implement 
any of the libraries into SQLite...


First things first, what do you mean by "repetitive"?

Do you mean that there are many copies of the same data?

Perhaps a better approach is to normalize the database and just store 
single copies of things.


If you have tables with duplicate rows, then add a 'quantity' column 
and reduce to one copy of the actual data.


If some columns are unique and some are repeated, perhaps try 
splitting the tables into more tables that are related.


This, really, is what you should be doing first, and may very well be 
the only step you need.


If you can't do that, then please explain in what way the data is repetitive?

-- Darren Duncan

Re: [sqlite] Compressing the DBs?

2006-07-05 Thread Mikey C


What platform are you using?

If you are using NTFS filesystem you can just mark the file for compression
and the OS takes care of it transparently.


-- 
View this message in context: 
http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175
Sent from the SQLite forum at Nabble.com.

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

Re: [sqlite] Compressing the DBs?

16 matches

Site Navigation

Mail list logo

Footer information